
RATAN TATA 
LIBRARY 









FIRST 


COURSE 

IN t>ROBABILITY 
AND STATISTICS 


by J. Neyman ; 

The University of California, Berkeley 


HENRY HOLT AND COMPANY 

New York 



Copyright, 1950 

by Henry Holt and Company, Inc. 
Printed in the United States of America 



Preface 


The present book is a revision of a mimeographed syllabus first issued 
in 1947 and is intended to provide sufficient material for a one-semester 
basic course for beginners, with enough additional material to give the 
instructor some liberty of choice. Three different categories of students 
are contemplated: (i) students who Avould like to take just one course in 
mathematical statistics for purposes of general education, (ii) prospective 
future mathematical statisticians and (iii) students who specialize or 
intend to specialize in one of the fields of application and need mathe¬ 
matical statistics as a useful tool in their own studies. 

Naturally, the statistical studies of students in categories (ii) and (iii) 
cannot be limited to the basic course alone and the material offered in 
subsequent courses must be adjusted to the varying needs and interests. 
The purpose of the basic course, offering essentially the same material for 
all groups of students, is to introduce the most fundamental concepts of 
modern statistical theory and to connect these concepts with as many fields 
of application as practicable. With this in mind and assuming that the 
reader^s mathematical education is limited to high school algebra, the book 
emphasizes the basic concepts of statistical theory treated on an ele¬ 
mentary level. 

Experience at the University of California has shown that there is a 
difference between the students of categories (i) and (ii) on the one hand 
and those of category (iii) on the other. Students of the first two categories 
attending the basic course in statistics are ordinarily freshmen or sopho¬ 
mores and have very little mathematical training. On the other hand, 
students of category (iii) are mostly seniors or graduates, with more ex¬ 
tensive mathematical preparation, including some calculus, and, what is 
much more important, with substantially higher intellectual maturity. 
Thus, while all the groups have tq begin at the beginning and have to 
cover essentially the same ground, in^'uctipn is easier with category (iii) 
than with the others, so that^it is g^sible to cover more material and 
with more detail. Hence, the' ihtendea single basic course was split into 
two courses, one for students of categories (i) and (ii) and the other for 
students of category (iii). 

In order to make the book suitable for both courses, certain require¬ 
ments of elasticity of presentation had to be met. Thus, while the material 
offered represents a structural whole intended for beginners in statistics 
otherwise intellectually mature, certain sections are marked with stars 
indicating that, with a less proficient class, these sections can be omitted 
without disrupting the continuity of thought. 

iii 



IV 


PREFACE 


As conceived in this book, the theory of statistics is a section of the 
theory of probability. Thus, the study of statistics, however elementary, 
must be preceded by that of probability. Accordingly, a substantial part 
of the book, namely Chapters 2 and 4, are given to an elementary pre¬ 
sentation of the calculus of probability. 

Ordinarily, elementary theorems on probability are illustrated in games 
of chance. A few such illustrations are undoubtedly useful. However, it 
has been the author^s experience that the students rapidly tire of the arti¬ 
ficial character of games of chance and lose interest in them. On the other 
hand, they show enthusiasm in studying simple probabilistic problems 
related to questions of general interest, such as those of public health, 
genetics, etc. This explains the contents of Section 2.8 and Chapter 3. 

Elements of the theory of statistics itself, more specifically, of the theory 
of testing statistical hypotheses, are given in Chapter 5, which is the 
heart of the whole book and its justification. In elementary terms Chapter 
5 summarizes the basic concepts of the theory introduced, some 20 years 
ago, in cooperation with Egon S. Pearson. Until recently, these concepts 
were taught in graduate courses only. The reason, probably, is that they 
were originally introduced in connection with various problems of practical 
importance, the handling of which requires substantial mathematical back¬ 
ground. However, taken by themselves and illustrated in simplified 
examples from various fields, the basic concepts of the theory of statistics 
are simple and do not require for their understanding more mathematics 
than is usually acquired iir high school. Also, the concepts are basic in 
enabling the student to understand the outcomes of statistical inquiries 
which he may meet in his university career and later in everyday life. 
Thus, Chapter 5 is the main part of the book and much of the rest is 
preparation for it. 

Since Chapter 5 is quite some distance from the beginning of the material 
intended for systematic instruction in Chapter 2, the reader of the book 
may become impatient to learn what mathematical statistics really is. 
Therefore, it was thought necessary to include the introductory Chapter 1. 
This is intended to explain the scope of the theory of statistics, the scope 
of probability and the relation between the two. To do this rapidly, it was 
necessary to anticipate a number of definitions and to appeal to the 
reader's intuition and complacency. In spite of the difficulties involved 
and in spite of the resulting shortcomings, it is hoped that Chapter 1 will 
be found intelligible and informative. It is likely to provide material for 
an introductory lecture or two. 

It is a pleasure to acknowledge the author's indebtedness to his friend 
and colleague. Professor Erich L. Lehmann, who conceived the idea that 
the basic concepts of the modem statistical theory can and should be 
taught on an elementary level. Also, he was the first to try this idea in 
class, with considerable success. 



PREFACE 


V 


The author also takes great pleasure in acknowledging invaluable help 
obtained from a number of friends and colleagues. In particular the author 
is deeply indebted to Douglas G. Chapman, Harry M. Hughes, Carl F. 
Kossack, Kenneth May, Elizabeth L. Scott and Esther Seiden, who read 
the manuscript, tried it in class and offered a number of constructive 
criticisms. In addition, they invented many interesting illustrations. 
Special thanks are due to my colleague, Dr. Elizabeth L. Scott. At the 
time when the book was written, Miss Scott was engaged in teaching a 
course parallel to that taught by the author and covering the material 
offered in the book. Thus, just as soon as a section was written and first 
tried in class, it was subjected to Miss Scott’s criticism, frequently re¬ 
lentless. Time will show whether or not the book is clear and useful. 
However, it is safe to say that, without Miss Scott’s cooperation, it would 
have been much worse than it is. 

J. N. 

Statistical Laboratory 
University of California 
Berkeley, California 
January, 1950 




Contents 


Chapter 1. Introduction: Scope of the Theory op Prob¬ 


ability AND Statistics. 1 

1 • 1 Concept of inductive behavior . 1 

1 • 2 Pattern of development of mathematical sciences. 2 

1*3 Scope of probability and statistics. 3 


Introduction—Probabilistic problem of Chevalier de M<5r4— 
Statistical problem of Chevalier de —Comparison—Choice 

of rules—Randomness—Rule of inductive behavior—Statistical 
decision function—Set of admissible hypotheses—Performance 
characteristic—Scope of mathematical statistics—Historical 
note—Problems and Exercises—References 


Chapter 2. Probability . 15 

2*1 Basic concepts. 15 

Use of the word probability^^—Definition of probability 

2*2 Illustrations. 16 

2 * 3 Relative or conditional probability. 20 

Definition—Example 

2*4 Further illustrations. 21 

A controversial problem—l^erformance characteristic of rule (i) 
of Chevalier de Mer<5—Problems and Exercises 

2*5 Certain symbols and some formulae from elementary algebra 31 


Summation symbol—Multiplication symbol—Factorial—Ar¬ 
rangements, permutations, and combinations—Formula for the 
number of arrangements—Formula for the number of permuta¬ 
tions—Formula for the number of coml)inations—Newton\s 
binomial expansion—Problems and Exercises 

2 * 6 Fundamental theorems.. 44 

Some definitions and operations relating to properties—Prob¬ 
lems and Exercises—Addition theorem on probabilities—Prob¬ 
ability of a sure property—Multiplication theorem on probabili¬ 
ties—A theorem on relative probabilities—Stochastic indepen¬ 
dence of properties—Theorem on independence—Intuitive diflS- 
culties—Complete independence of properties—Second theorem 
on independence—Multiplication theorems for completely inde¬ 
pendent properties and sets of properties—Problems and 
Exercises 

2 * 7 The problem of bag and boxes. 65 

Problems and Exercises 


vu 














Vlll 


CONTENTS 


2-8 Evaluation of competing risks. 69 

Notion of competing risks—A mMhematical model—Geometric 
progression—^Relations between the net and crude rates of risks 
—Problem of competing risks. More realistic treatment—Prob¬ 
lems and Exercises 


Chapter 3. Probabilistic Problems in Genetics. 96 

3 * 1 Outline of the laws of heredity. 96 

Introduction—N otation—^Axioms 

3 *2 Probabilities of inheritance from parents.102 


Inheritance of a single pair of genes—Inheritance of traits de¬ 
pending on a single pair of genes—Relative probabilities of 
stated genetical compositions given the dominant trait—Inheri¬ 
tance of several unlinked pairs of genes—Two pairs of linked 
genes. Probabilities relating to reproductive cells—Two pairs of 
linked genes. Probabilities of inheritance from parents 

3 *3 Study of successive generations. 114 

Introduction—Panmixia—Problems and Exercises—Successive 
generations under panmixia with no selection. Case of one pair 
of genes—Successive generations under panmixia and mass selec¬ 
tion against recessives—Problems and Exercises—Brother-sister 
mating—Problems and Exercises—Successive generations under 
panmixia with no selection. Case of two pairs of genes—Stabili¬ 
zation of the distribution of genetical types in successive genera¬ 
tions—Stable distributions—^Example—^Problems and Exer¬ 
cises—References 


Chapter 4. Random Variables and Frequency Distribu¬ 
tions .164 

4*1 Random variables.164 

Concept of a function—Problems and Exercises—Random vari¬ 
able—Joint distribution of several random variables—General 
properties of distribution—Concept of the most probable value 

4*2 Binomial variable.179 

Definition—Frequency function of the binomial variable— 
General properties of the binomial distribution-r-Problems and 
Exercises—Elimination mating—Problems and Exercises 

4*3 Weighted binomial variable.195 

Frequency function of the weighted binomial—Problem of diag¬ 
nosis—Problems and Exercises 

4*4 Hypergeometric distribution.200 


Hypergeometric variable—Frequency function of the hyper- 
geometric variable—Problems and Exercises 

4*5 Limits of the hypergeometric and the binomial frequency functions 204 
Four useful formulae on limits—^Problems and Exercises— 













CONTENTS 


ix 


Binomial distribution as the limiting form of the hypergeometric 
—Poisson Law as th(i limit of the binomial—Problems and 
Exercises—Normal Law as the limit of the standardized bino¬ 
mial—The use of the Normal integral as an approximation to 
the binomial probability—Geometric interpretation of the 
theorem of Laplace 

4-6 Proof of the Theorem of Laplace.234 

Prerequisites from calculus—General idea of the proof—Lemma 
of Duhamel—Proof of Laplace^s theorem—Problems and 
Exercises—References 

Chapter 5. Elements of tub Theory of Testing Statis¬ 


tical Hypotheses .250 

5 • 1 Statistical hypotheses and their tests.250 


Basic ideas—Problems and Exercises—^^Fests of statistical hy¬ 
potheses—Errors in testing hypotheses—Problems and Exer¬ 
cises — Critical region. Level of significance. Power function of a 
test 

S-2 Statistical hypotheses. Illustrations.268 

Screening for tuberculosis—Problem of the Lady tasting tea— 
Relation between theory and reality—Screening for tuber¬ 
culosis. Bivariate case—Problems and Exercises 

5 • 3 Simple hypothesis H tested against a single simple alternative H 304 
Best critical regions—Method of constructing best critical re¬ 
gions—Short cut in constructing the B.C.R.—Problems and 
Exercises—Distribution problem in testing statistical hypotheses 
—Problems and Exercises—Standard family of B.C.R.’s— 
Problems and Exercises 

5*4 Test of a simple hypothesis against a composite alternative . 324 

Uniformly most powerful tests—Search for U.M.P.C.R. in the 
case of a simple hypothesis concerning the binomial distribution 
—Problems and Exercises—Use of the normal approximation in 
testing hypotheses relating to binomial variables—Problems and 
Exercises—Hypotheses relating to the hypergeometric random 
variable. Industrial sampling inspection—Problems and Exer¬ 
cises—Hypotheses relating to the hypergeometric random 
variable. Fish-tagging experiments—Problems and Exercises— 
Lambda-principle of testing hypotheses—Problems and Exer¬ 
cises—^References 


Appendix—^Tables of the Normal Integral 

Direct Table of the Normal Integral.345 

Inverse Table of the Normal Integral.346 

Index of Names .349 

Index of Terms.351 












CHAPTER 


Introduction 


Scope of the Theory of Probability and Statistics 


I'l. Concept of inductive behavior 

Claims are occasionally made that mathematical statistics and the theory 
of probability form the basis of some mental process described as ‘inductive 
reasoning.’^ However, in spite of substantial literature on this subject, 
the term ^'inductive reasoning^’ remains obscure and it is uncertain whether 
or not the term can be conveniently used to denote any clearly defined 
concept. On the other hand, as was first remarked in 1937 [15]*, there 
seems to be room for the term ‘^inductive behavior.'^ This may be used to 
denote the adjustment of our behavior to limited amounts of observation. 
The adjustment is partly conscious and partly subconscious. The conscious 
part is based on certain rules (if I see this happening, then I do that) 
which we call rules of inductive behavior. In establishing these rules, the 
theory of probability and statistics both play an important role, and there 
is a considerable amount of reasoning involved. As usual, however, the 
reasoning is all deductive. 

Human progress is based on ‘‘permanencies^' or, rather, on our ability 
to detect permanencies both in the objects surrounding us and in changes 
in these objects which we describe as phenomena. 

One of the first “permanencies" noticed may have been the dimensions 
of objects, at least of some objects. Thus “dimensions" came to be con¬ 
sidered as properties of objects and soon rules were invented to measure 
them. With some more experience it was found that the permanencies 
which once seemed firmly established are far from absolute. Thus, the 
dimensions of many objects change in time. Upon noticing this, the human 
mind set to work to establish “permanencies" in the process of these 
changes. 

With many phenomena certain permanencies appear quite stable. This^ 
created the habit of regulating our actions in regard to some observed! 
events by referring to the permanencies which at the particular momen^ 
seem to be established. This is what we call inductive behavior. | 

Early in human history it was established that rain or snow storms 

^Figures in square brackets refer to literature listed at the end of the Chapter. 

1 



2 


MATHEMATICAL SCIENCES 


[1-2] 

follow the appearance of heavy clouds. This is one of many permanencies 
noted. Although this permanency is not absolute (just as most other 
permanencies are not absolute), human beings and also some animals 
tend to take cover whenever dark clouds appear in the sky. This is an 
example of inductive behavior. Frequently it leads to satisfactory results 
but, naturally, not always. 

In the seventeenth and eighteenth centuries, a new kind of ‘^perma¬ 
nency’^ was detected by gamblers on the continent of Europe and by 
Masons in England. The permanencies noted previously were dimensions, 
distances, weights, etc. The newly detected permanency was the relative 
frequency with which a particular result occurs in repeated trials in which 
the outcome of any single trial is unpredictable. 

The first gambler who thought of “loading” a die is the author of the 
concept that the relative frequency with which the die falls with six dots 
up is this die’s permanency. This permanency or, let us say, property is 
measurable and unchangeable to the same extent as the die’s dimensions 
and weight. The Masons in England made a similar discovery in relation 
to the deaths of persons of the same age who were living in similar condi¬ 
tions. 

j Once these permanencies were noticed, the corresponding abstract con- 
I cepts were easy to create. The postulated long-run relative frequency re¬ 
ceived the label of probability and found useful applications in forming 
our inductive behavior. This was the origin of the calculus of probability 
and of mathematical statistics. 

1’2. Pattern of development of mathematical sciences 

The theories of probability and of statistics are not the only mathe¬ 
matical disciplines which originated from the demands of practical life. In 
fact, a closer inspection reveals that all other sections of mathematics have 
the same utilitarian origin, although in some cases their connection with 
problems of practical application is very indirect. Usually the development 
of a branch of mathematics has the following pattern. First a category of 
permanencies is established and this creates a number of problems of in¬ 
ductive behavior. The next step consists in forming an abstract model of 
the phenomena in which the originally somewhat vague permanencies are 
elevated to the important role of basic concepts and axioms. Efforts are 
made to express them in as precise terms as possible. The abstract mathe¬ 
matical model is used to deduce various conclusions from the assumed 
axioms, and this is where reasoning comes in. These conclusions serve 
partly for verifying the adequacy of the mathematical scheme and partly 
for forming rules of inductive behavior. 

The problems of behavior seem to prevail in the early stages of research; 
the verification of the model comes second. Gradually, however, the 



PROBLEM OF BE MERE 


3 


[1-3-1] 

situation becomes reversed and, in addition, a third category of problems 
appears in the discussions. These last problems are those dictated by pure 
curiosity as to the consequences of the axioms assumed, as to the mutual 
relations of these axioms to each other, etc. When the problems of the 
third category become prevalent, the particular mathematical discipline 
ceases to be an applied science and becomes a branch of pure mathematics. 

Reviewing the development of modern geometry, it is easy to notice 
that ii follows exactly the pattern just described. The first concepts of 
length, area, straight line and plane must have arisen from such practical 
problems in ancient Egypt as the redistribution of land after each flood 
of the Nile. At first the Egyptian recipes for measuring land were crude. 
For example, it was considered that the half product of the two smaller 
sides of any given triangle equals its area. Later on, however, the Greeks 
came into the scone and Euclidean geometry was built up. Although the 
results were perfectly satisfactory for the needs of the contemporary sur¬ 
veyors, it is safe to say that in his work Euclid was motivated by pure 
scientific curiosity rather than by practical considerations. In our era, of 
course, the problems relating to surveying do not play any role in what is 
now called geometry. 

1’3, Scope of probability and statistics 

1*3*1. Introduction. Just as geometry started its existence from such 
questions as ^4iow large is the area of this piece of land,^^ the theory of 
probability was born of the question “how frequently^' will this or that 
event occur in the course of a long series of trials. In order to answer the 
question of the area of a piece of land the geometers require certain data 
expressed in geometrical terms, Such as the shape of the boundary, perhaps 
a rectangle, and the dimensions. Similarly, to answer the question as to 
the frequency of an event E, some data are required which must also be 
expressed in terms of the frequencies of some related events A, R, (7, etc., 

Thus, it may be said that the purpose of the classical theory of probability^ 
is to find methods of computing the relative frequencies of some events E from, 
the given relative frequencies of some related events A, B and C. To begin\ 
with, most of the problems considered were of a utilitarian type. Later on, 
more and more attention began to be given to purely theoretical problems, 
and now the theories of probability and statistics appear to be in the 
transitional stage between applied and pure mathematical sciences. 

In recent times, the ideas on the purpose and the scope of the theory 
of probability became diversified, with various claims being made relating 
to “inductive reasoning" and “intensity of belief." Several works in this 
direction are listed at the end of this Chapter [1; 8, 10]. However, the 
present book deals exclusively with the theory of probability from the 
classical point of view, as a mathematical model of relative frequencies 



4 SCOPE OF PROBABILITY [1-3-2] 

observable in long series of trials. It has for its modest purpose the com¬ 
putation of relative frequencies of some events from the postulated fre¬ 
quencies of some others. The honor of promoting and popularizing this 
point of view on probability is due to the German scholar, Richard von 
Mises [13] who also proposed his own definition of mathematical prob¬ 
ability. This definition was refined later by Copeland [2], Wald [22], and 
others. While following von Mises in his philosophical outlook on the 
theory of probability, the definition of probability given below is difrerent 
from that of von Mises. 

The mathematical theory of statistics, or mathematical statistics, is a 
section of the theory of probability. Each p roblem of mathematical statis¬ 
tics as understood in this book is essentially a problem of probability. 
However, the reverse is not true. The problems of probability which fall 
in the category of problems of statistics are those most closely connected 
with rules of inductive behavior when the data on which it is necessary 
to act are of a random or, let us say, a probabilistic nature. 

1-3-2. Probabilistic problem of Chevalier cle M6re. The following ex¬ 
ample taken from the early period of the development of the theory of 
probability is intended to explain this point. 

Todhunter [21] describes the following incident. Late in the seventeenth 
century a French nobleman, Chevalier de M4y6, addressed to the famous 
mathematician Pascal a question concerning a certain game of dice. The 
game consisted of 24 throws of a pair of dice. One could bet even money 
either on the occurrence of at least one ‘^double six^' in the course of the 
24 throws or against it. Some theoretical considerations led de M6r6 to 
believe that betting on the ‘‘double six^' is advantageous. On the other 
hand, the nobleman^s empirical trials seemed to contradict this conclusion. 

The answer of Pascal was: given that the dice are “fair,'' that is to say, 
given that each die falls with equal frequency on each of its sides, the 
relative frequency of games with at least one “double six” is .491. Thus 
it appears that Chevalier de M4r6’s empirical research, rather than his 
theoretical analysis, was correct. 

Obviously, the problem solved by Pascal was a problem of probability: 
from postulated relative frequencies relating to* throws of two particular 
dice, he deduced the relative frequency of a specified outcome of 24 throws. 
This problem is one that belongs to the pure theory of probability, and is 
not related to statistics. 

The de M4r4’s reasons for experimenting with dice and for questioning 
Pascal were, probably, utilitarian: he wanted to know how to bet. The 
essential point in the situation is that de M6r&& probable decision to bet 
against the appearance of the “double six” had to be based solely on the 
hypothetical premise that the dice are fair or, in other words, that in the 
forthcoming games the dice will behave in a manner postulated in the 



PROBLEM OF DE MERE 


5 


[1-3-3] 

solution of the problem in probability. Similar hypothetical premises are 
unavoidable in all those cases when an action is taken in conformity with 
the solution of some mathematical problem. However, there are situations 
where, apart from the hypothetical premise of the general conformity of 
the observable phenomena with the assumptions of the theory, the selection 
of action to be taken depends on a special additional element. -^ 

1-3‘3. Statistical problem of Chevalier de Mere. The incident just de¬ 
scribed of the contacts between Chevalier de M6r6 and Pascal belongs to 
history. We will now complement this story by visualizing some doubts 
which Chevalier de M4r6 must have had in applying in practice the solution 
of Pascal. The doubts may be presumed to concern the basic premise 
that the dice used by some other gambler are actually fair. If they were 
not, then the solution of the problem furnished by Pascal would have been 
valueless. 

The extensive experiments with throwing dice performed by de M6r6 
must have taught him that an entirely ‘^fair^^ die is rather difficult to 
manufacture. Therefore, when confronting a gambler using his own dice, 
Chevalier de M6r6 was probably aware of the possibility that these dice 
might not be quite fair although not necessarily intentionally loaded. Also, 
there must have been cases in de M6r4’s experience where the intentional 
loading of dice was not excluded. In all such cases he must have wondered 
how to bet. It is safe to presume that, faced with the prospect of a game 
with unfamiliar dice, Chevalier de M6r6 must have considered ‘^kibitzing'' 
several, perhaps three, games and then deciding on what further action 
to take. We may visualize the following three steps contemplated by de 
M6r6: (aO to consider that the dice are not intentionally loaded and to 
bet against the ‘^double six^^- (uz) to consider that the dice are slightly 
biased in favor of ‘^double six,” but not intentionally loaded and to bet 
on ^‘double six”; (a.,) to consider that the dice are loaded intentionally and, 
therefore, to take strong action against the crooked adversary. 

After deciding that his future action will have to be either ax or Uz or a ^, 
Chevalier de M6r6 must have thought of the proper choice of a particular 
action in conformity with the outcome of the first three games of 24 
throws each. For example, he may have thought of the following two rules 
of inductive behavior: 

Rule (i). This rule is based on the number X of games in which ‘'double 
six” appeared at least once: 

if X = 1 then take action ax (i.e., bet against “double six”); 

if X = 2 then take action aj (i.e., bet on “double six”); 

if X = either 0 or 3 then take action a^ . 

Rule (ii). This rule is based on the total number Y of “double six” which 
appeared in the 72 throws forming the three games witnessed: 



6 


SCOPE OF STATISTICS 


[1-3-5] 


if y = either 0 or 1 or 2, then take action ai 
(i.e., bet against ^‘double six^O; 

if y = either 3 or 4 or 6 or 6, then take action at 
(i.e., bet on “double six^O; 

if y is either equal to or greater than 7, take action at . 

1 •3-4. Comparison. Let us now examine more closely the two situations 
just described and compare the nature of inductive behavior involved. We 
shall begin by labeling the situations. The first situation, when de M6r6 
contacted Pascal, will be labeled the case of fair dice. The second, hypo¬ 
thetical situation in which de M^r4 is faced with dice of unknown proba¬ 
bilistic properties will be called the case of doubtful dice. 

The nature of inductive behavior in these two cases has a certain point 
in common, but there exists a difference of utmost importance. The com¬ 
mon point is that in both cases the decision to behave in one way or 
another is based on the premise that the dice are not regulated by some 
sort of sleight of hand but behave in conformity with a strictly defined 
probabilistic scheme. 

The difference between the two situations is that in the case of fair dice, 
the omnipresent premise of general conformity of the phenomena con¬ 
sidered with the basic assumptions of the mathematical theory is the only 
basis for de M^r^’s decision to bet against the “double six,^' whereas in 
the case of doubtful dice the actions of the gambler also depend on a new 
element which does not appear in the case of fair dice. This new element 
is the outcome of the first three games which we expect de Mtrk to witness 
before deciding what to do in the subsequent games. 

The basic premise that all the games are chance games (although perhaps 
with loaded dice) implies that the outcome of the first three games is 
random. The adoption of rule (i) or rule (ii) makes the subsequent action 
of Chevalier de M6r6 a well defined function of the outcome of the first 
games which is considered as a random event. As a result, the choice of 
the action is itself a random event, with the further consequence that it 
is possible to study various probabilities relating to this choice. 

1-3-5. Choice of rules. If confronted with the necessity of adopting rule 
of inductive behavior (i) or (ii), Chevalier de M6r6 would probably have 
wondered which of them would suit his purposes better. To decide, he 
would have wished to know something about the properties of these rules. 
The realization of the nature of these properties is decisive for the under¬ 
standing of the basic idea of mathematical statistics. 

The only purpose of de M4r6 adopting either rule (i) or rule (ii) would 
be to diminish the frequency of his being duped by crooked gamblers, to 
increase the frequency of his betting on the favorable outcome of the game 
(that with probability greater than one-half) and to decrease the frequency 



PROBLEM OF D£ MERE 


7 


[ 1 - 3 - 5 ] 

of getting into unnecessary trouble with strong action Ua. Thus the relevant 
property of the rules of inductive behavior is that of answering the question 
how frequentlyj in each possible set of circumstancesy a given rule will lead 
to each of the three actions contemplated, Ui , Ua > Ua . 

It is clear that the ‘‘circumstances’^ which are relevant to the problem 
of a gambler in the case of doubtful dice are perfectly described by the 
value of the probability, say P, that “double six” will occur at least once 
in a game of 24 throws. If the dice are doubtful, then P is unknown and 
may be anything between zero and unity. Yet, it is the value of P which 
is relevant to the choice between the three actions, ai , Ua, Ua • 

Action Ua , if contemplated at all, would be appropriate in those cases 
where P is close to either zero or unity. If P is less than one-half but not 



Figure 1. Performance Characteristic of Rule (i). 

very small, then the preferred action is Ui . Similarly, Oa is indicated when 
P exceeds one-half but is not too large. Further, it is essential to notice 
that, if P is not very different from one-half, the adoption of action 
must be considered very undesirable. 

In consequence, to describe the relevant properties of rules (i) and (ii) 
of inductive behavior so as to provide a basis for a rational choice between 
them, it is necessary to answer the question: if P has this value, for exr 
ample P = .5, how frequently would rules (i) and (ii) lead to each of the 
three actions Ui , a 2 and a^ ? 

In relation to the rules (i) and (ii) the answers to the above question, 
relative to every value of P from zero to unity, are summarized in Figures 



8 SCOPE OF STATISTICS [1’3*5] 

1 and 2, respectively. In each case the quantity measured on the axis of 
abscissae is the probability P that a game of 24 throws of two dice will 
result in at least one ‘‘double six.’* Thus, each point on the horizontal axis 
represents one of the circumstances of the game with doubtful dice which 
may present itself to a gambler. The quantity measured on the vertical 
axis is the probability that the rule of inductive behavior, either (i) or (ii) 
will lead to each of the three actions contemplated. The curves relating 
to actions ai , a 2 , and a,, are marked p(ai), p(a 2 )> and 'p{a^)j respectively. 

To illustrate the situation, consider the ordinates of the three curves in 
Figure 1, corresponding to P == .5. 


P.C. 



They are, respectively: 

p(ai) = pCtta) = .375, 
p(a3) = .250. 

The interpretation of these figures is as follows: if P = .5, then rule (i) 
would lead to actions Oi and a 2 with the same relative frequency, .375; 
action as will be adopted in 25 percent of the cases. 

The probabilities of the three actions relating to the same value P = .5, 
but corresponding to rule (ii) are 

p(ai) = .663, p(a 2 ) = .333, and p{as) = .004. 

Thus, when the probability of at least one “double six^^ in a game is 
equal to one-half, rule (ii) will lead the gambler to bet against “double 
six^' in 66.3 percent of all cases, to bet on “double six'’ in 33.3 percent of 



FUNDAMENTAL CONCEPTS 


[1-3-6] 


9 


all cases and to accuse his opponent of cheating in only .4 percent of the 
cases. 

When P = .5, it is immaterial whether to bet on or against ^'double 
six.'' On the other hand, the accusation of cheating would be most in¬ 
appropriate in this case. Therefore, for the particular value of P = . 5 , 
the rule of inductive behavior (ii) has a definite advantage over rule (i). 

Unless Chevalier de M^r 6 was rather pugnacious, if he interpreted the 
results as above, inspection of the two diagrams might have suggested that 
the rule of inductive behavior (ii) is preferable to (i) for many values of 
P in addition to P = .5. However, it will be noticed that rule (ii) has the 
following weakness. Should the adversary of Chevalier de M 6 r 6 be pre¬ 
pared to cheat and should he know that Chevalier de M6r6 is following 
rule (ii), then he could adjust his dice to bring “double six" with relative 
frequency of about P = .55, i.o., favoring the “double six." A glance at 
Figure 2 shows that in these circumstances the chances of action are 
minute and that in about six times out of ten rule (ii) would lead to betting 
against “double six." Simple calculations which will be explained in the 
following chapters show that, with P = .55, rule (ii) would lead Chevalier 
de M4r6 to lose about 51 games out of a hundred. Thus, with enough 
playing, his adversary could cheat him out of his money. No such quiet 
enrichment is in prospect for the owner of the dice if Chevalier de M4r4 
adopts rule (i). If this rule is modified to omit entirely action 03 with the 
obvious changes in regard to actions ax and Ug, it can be shown that every 
artificial increase of the probability P above one-half and also every de¬ 
crease below one-half, will result in an increase of winnings to Chevalier 
de M 6 r 6 . Thus, the modified rule (i) would make it expedient for de M 6 r 6 's 
adversary to load the di(;e so that the probability of at least one “double 
six" is equal to one-half. In this case the game would be a fair game. 

After this introduction we define several useful concepts, some of which 
were already mentioned, and describe the scope of mathematical statistics 
as understood in this book. More detailed discussion of the concepts 
involved will be found in Chapter 5. 


1*3-6. Randomness. Consider an experiment capable of yielding one or 
the other of some n different outcomes Pi, P 2 , • • • > • If a mathematical 

treatment of this experiment is adopted postulating that to each of the 
outcomes Pi , P 2 , * • • j - 6 ’n , there corresponds a definite probability, 
whether known or unknow n, whether the same for each P* or not, then 
the experiment in question will be described as random. The adjective] 
random will also be applied to describe the outcomes Pi , P 3 , • • • , Pn . 

Thus, if it is taken for granted that, in the process of throwing a die, to 
each number of dots on the upper side (these are the six possible outcomes 
of the trial) there corresponds an appropriate probability, then the throws 
of the die are random. Obviously, the question of whether or not it is 



10 MATHEMATICAL STATISTICS [1*3*9] 

appropriate to postulate probabilities in any particular case lies outside 
of the theory. 

1-3-7. Rule of inductive behavior. We have already used the phrase 
‘‘rule of inductive behavior.^’ The precise definition is as follows. 

Let £ 1 , , * * • > , • • • be all possible different outcomes of an experi¬ 
ment or of observations relating to some phenomena. Let , Og, • ■ •, , • • • 

be all the different actions contemplated in connection with these phe¬ 
nomena. 

If a rule R unambiguously prescribes the selection of action for each possible 
ouicome Ei , then it is a rule of inductive behavior. 

In this definition it is not required that the outcomes Ei , E 2 , • • • , 
En 9 • • • necessarily be random. However, all rules of inductive behavior 
considered in this book relate to random events. 

1*3- 8 . Statistical decision function. Let R be any rule of inductive be¬ 
havior relating to some random experiment with the different possible 
outcomes Ei , E 2 , • • • f En , • • • and let Ui , 02 , • • • , , • • • be the dif¬ 

ferent actions prescribed by this rule. 

The statistical decision function of the rule R is the function a{E) estab¬ 
lishing the correspondence between the possible outcomes of the experiment and 
the actions to be taken in accordance with the rule R. 

The argument of the statistical decision function is, then, the possible 
outcome of the random trial. The “value^^ of the statistical decision func¬ 
tion is the action prescribed by the rule. It is obvious that to define a rule 
of inductive behavior means to define its statistical decision function. 

In the case of Chevalier de M4r4, rules (i) and (ii) were defined by the 
statistical decision functions capable of having only three values: actions 
Oi , tta , and Ua . In many problems statistical decision functions are two¬ 
valued. However, in other cases, the set of different values of a statistical 
decision function may be infinite. 

1*3*9. Set of admissible hypotheses. The necessity of adopting a rule 
of inductive behavior relating to some random trial T occurs when the 
probabilities of the different outcomes of this trial are not known and 
where the desirability of this or that possible action depends on the values 
of these probabilities. This point was illustrated in the foregoing discussion 
of the case of doubtful dice where the desirability of the actions ai , aa , 
and 03 depends on the value of just one probability P, that in the course 
of 24 throws of two dice “double six'' will occur at least once. The two 
suggested rules of inductive behavior (i) and (ii) were compared on the 
assumption that P may have any value between zero and unity. 

In other cases the desirability of particular actions contemplated may 
depend on the value of not one but of several probabilities connected with 



[1*3-11] SCOPE OF STATISTICS 11 

the trial T. Also this desirability may be determined not by the prob¬ 
abilities themselves but by the values of some quantities Bi , 62 ,•••, 6 ,, 
termed parameters, which in turn determine the probabilities. 

Whatever the case may be, a rational choice among the possible rules 
of inductive behavior requires the definition of the set, say 0, of values of 
probabilities underlying the trial T or of the set of values of parameters 
which, in the particular problem, are considered as possible or admissible. 
It is easy to see that the choice of the rule of inductive behavior may well 
depend on how wide the set 12 is. The problem of doubtful dice may be 
used to illustrate this point. Suppose, for example, that the dice used are 
known to have been manufactured by some process which may cause only 
a particular kind of bias, namely, that of diminishing the probability P 
of a ‘^double six.” In this case the set Q of possible values of P is limited 
to the interval 0 ^ P g .5 and the choice between rules (i) and (ii) depends 
on the sections of Figures 1 and 2 to the left of point P = .5. The decision 
as to which of the two rules is more advantageous in this case may well 
be different from the one in the case where all the possible values of P 
between zero and unity are admissible. 

Let the probabilities underlying a random trial T be determined by the 
system, say d, of values of some parameters 61 , O 2 ^ • • • 0 ^ , The set of all 
systems d which in a given case are considered as possible is called the set of 
admissible hypotheses- Each system 6 will ber called either an admissible 
simple hypothesis or an admissible ‘‘parameter point.^^ 

It is usual to denote the set of admissible hypotheses by the letter 

1-3-10. Performance characteristic. Let P be a rule of inductive be¬ 
havior relating to some random trial T, and let Oi, aj, • • • , a„ , • • • , be 
the different actions prescribed by P. Further, let be the set of admissible 
hypotheses 6- 

For each particular $ of the set 12, consider the probability p{an | 6) that 
the application of rule P will lead to action an , for n = 1, 2, • • • . Each 
probability p(a„ | 6) is, then, a function of d defined over all the set 12. The 
system of functions p(ai | 6), p{a 2 \B), • • • , p{an | ^), • • • , representing the 
probabilities that the rule R will lead to particular actions ai, a 2 , • • • , fln > • • • > 
is called the performance characteristic of rule P. 

A rational choice between several possible rules of inductive behavior is 
possible only on the basis of their performance characteristics. Figures 1 
and 2 above give graphs of the performance characteristics of the two rules 
of inductive behavior (i) and (ii) relating to the problem of doubtful dice. 

1*3 *11. Scope of mathematical statistics. Mathematical statistics is a 
branch of the theory of probability. It deals with problems relating to per-^ 
formance characteristics of rules of inductive behavior baced on random 
experiments- 



12 MATHEMATICAL STATISTICS [1*3'12] 

Some of such problems were illustrated above: given a rule of inductive 
behavior, to determine its performance characteristic. The more interesting 
and more important problems are of a slightly different character: given 
a random trial, given a set Q of admissible hypotheses, and given the 
possible actions ai , a 2 , • • • , contemplated, to determine the rule (or 
rules) of inductive behavior with some prescribed properties (“optimum^’ 
properties) of the performance characteristics. 

In relation to Chevalier de M4r6 in the case of doubtful dice, the relevant 
problem may be formulated as follows: is there a rule Ro of inductive 
behavior such that, in whichever way the dice are loaded, the adoption 
Qf Ro will insure that the chance of de M4r6^s winning is greater than that 
obtainable by any other conceivable rule 72? If so, then what is this opti¬ 
mum rule Ro ? 

In modern times we are less concerned with games of dice and more 
with problems of science. Hence, instead of the problems of Chevalier de 
M6r6, statistical research has in mind the doubts of an experimenter or 
of an observer. The role of the opponent of Chevalier de M6r6 is played 
by the existing, but still undetected, permanencies which we may sum¬ 
marize in the single word Nature. 

1*3*12. Historical note. The first statistical problems which attracted 
attention were those involving a two-valued statistical decision function. 
To begin with, only particular problems were attacked concerning rules 
of inductive behavior suggested by intuition. The honor of initiating work 
in this direction seems to belong to the French mathematical genius 
Pierre-Simon de Laplace [11] (1749-1827) in his studies relating to the 
origin of comets. More recent remarkable results in this respect are due 
to Karl Pearson [19], “Student” [20], and R. A. Fisher [5]. The general 
theory relating to two-valued statistical decision functions, known as the 
theory of testing statistical hypotheses, was started in 1928 [17] with the 
basic memoir [18] appearing in 1933. Almost simultaneously another 
category of problems of inductive behavior came under consideration, in¬ 
volving statistical decision functions with infinitely many values. This is 
known as the theory of statistical estimation and falls under two headings: 
point estimation and estimation by intervals. Remarkable early results 
under the first heading are due to Markoff [12], R. A. Fisher [6], Hotelling 
[7], and Doob [4]. 

The theory of estimation by interval is also known as the theory of 
confidence intervals. Although the first results relating to this theory ap¬ 
peared in 1934, the first comprehensive memoir [14] was published in 
1937. The general theoiy of statistical decision functions embracing both 
the theory of testing statistical hypotheses and the theory of estimation is 
due to Abraham Wald [23]. 

Recent years have brought a great number of important results, some 



HISTORICAL NOTE 


13 


[ 1 - 3 - 12 ] 

of which will be quoted in the following sections. An important com¬ 
pendium of these results on an advanced level is to be found in a recent 
two-volume book •by Kendall [9]. A remarkable synthesis, also on an 
advanced level, is due to Cram4r [3]. 

PROBLEMS AND EXERCISES 

1. Describe a situation in everyday life which answers the description of 
inductive behavior relating to the outcomes of a “random experiment.’' 
Describe the possible actions to be taken and the possible outcomes of the 
experiment. Formulate exactly a rule of inductive behavior which seems 
reasonable to you. Without any computations, make a rough sketch of 
what you expect the performance characteristic of this rule to be. 

2. Imagine you are Chevalier de M 6 r 6 and verify empirically that, 
with reasonably fair dice, the rule of inductive behavior (ii) leads to strong 
action much less frequently than rule (i). Describe in detail the experi¬ 
ment needed to establish this fact. Perform this experiment and comment 
on the results. 

3. Imagine you are Chevalier de M6r6 and wish to verify experimentally 
the accuracy of the performance characteristic of rule (i) as given in Figure 
1. For this purpose arrange an experiment with the probability of a par¬ 
ticular outcome A equal to .2 (alternatively, to .4, .6, .8). For example, 
you may put five balls in a bag, one of them black (alternatively two black, 
three black and four black) and the others white. Your experiment will 
then consist in drawing at random a ball out of the bag. The outcome A will 
consist in drawing a black ball. Perform some 20 series of triple experiments, 
replacing the ball in the bag after each draw and shaking the bag. Treat 
each triplet of experiments as if they were three games of dice (24 throws of 
a pair) which are kibitzed by Chevalier de M6r6 and treat the drawing of a 
black ball as the appearance of at least one “double six.” Count cases when 
rule (i) would lead to actions ai , a2 and O3 , respectively. Compute the 
relative frequencies of these actions and compare them with probabilities as 
read from Figure 1. What relative frequency of action ai would you expect 
if the number of black balls in the bag is one (two, three, four)? What are 
the expected relative frequencies of action as in the same circumstances? 

REFERENCES 

1. E. Borel, Valeur Pratique ei Philosophie des ProbabiliUs. Paris: Gauthier-Villars, 

1939. 

2. A. H. Copeland, ‘^The theory of probability from the point of view of admissible 

numbers.^^ Ann, Math. Stat.y Vol. 3 (1932), p. 143. 

3. H. Cram4r, Mathematical Methods of Statistics. Princeton, N. J.: Princeton Uni¬ 

versity Press, 1946. 

4. J. L. Doob, “Probability and statistics.” Trans. Amer. Math. Soc., Vol. 36 (1934), 

p. 759. 



14 MATHEMATICAL STATISTICS [1*3*12] 

6. R. A. Fisher, Statistical Methods for Research Workers. Edinburgh: Oliver and Boyd; 
first edition appeared in 1925. 

6. R. A. Fisher, *‘On the mathematical foundations of theoretical statistics.^' Phil. 

Tran. Roy. Soc., A, No. 222 (1921), p. 309. 

7. H. Hotelling, “The consistency and ultimate distribution of optimum statistics.” 

Trans. Amer. Math. Soc.y Vol. 32 (1930), p. 847. 

8. H. Jeffreys, Theory of Probability. Oxford: Clarendon, 1939. 

9. M. G. Kendall, The Advanced Theory of Statistics, Vols. 1 and 2. London: Griffin, 

1943 and 1946. 

10. J. M. Keynes, A Treatise on Probability. London: Macmillan, 1921. 

11. P. S. de Laplace, Oeuvres Complhtes, Vol. 8 (1841 edition). Paris: Gauthier-Villars, 

p. 249 

12. A. Markoff, Calculus of Probability (Russian). St. Petersburg: Academy of Sciences, 

1913. 

13. R. von Mises, Probability, Statistics and Truth. London: W. Hodge, 1939. 

14. J. Neyman, “Outline of a theory of statistical estimation based on the classical 

theory of probability.” Philos. Trans. Roy. Soc., A, No. 236 (1937), p. 330. 

15. J. Neyman, “L'estimation statistique trait^e comme un probldme classique de 

probabilit6.” Actualitis scientifiques et industrielles, No. 739 (1938), p. 25. 

16. J. Neyman, “Basic ideas and some recent results of the theory of testing statistical 

hypotheses.” J. Roy. Stat. Soc., Vol. 105 (1942), p. 292. 

17. J. Neyman and E. S. Pearson, “On the use and interpretation of certain test criteria 

for purposes of statistical inference.” Biometrika» Vol. 20-A (1928), pp. 175 
and 263. 

18. J. Neyman and E. S. Pearson, “On the problem of the most efficient tests of sta¬ 

tistical hypotheses.” Philos. Trans. Roy. Soc., A, No. 231 (1933), p. 289. 

19. K. Pearson, “On a criterion that a given system of deviations from the probable 

in the case of a correlated system of variables is such that it can be reasonably 
supposed to have arisen in random sampling.” Phil. Mag. (5), Vol. 50 (1900), 
p. 157. 

20. “Student,” “On the probable error of the mean.” Biometrika, Vol. 6 (1908), p. 1. 

21. I. Todhunter, A History of the Mathematical Theory of Probability. London: Mac¬ 

millan, 1865. 

22. A. Wald, “Die Wiederspruchfreiheit des Kollektivbegriffes.” ActualiUs Sci. Indust., 

No. 735 (1938), p. 79. 

23. A. Wald, “Contributions to the theory of statistical estimation and testing hy¬ 

potheses.” AnneUs Math. Stat., Vol. 10 (1939), p. 299. 



CHAPTER II. 


Probability 


2*1. Basic Concepts 

2*1-1. Use of the word “probability.” In everyday life the word prob¬ 
ability is used in many different connections and has a meaning related to 
varying degrees of belief in the occurrence or nonoccurrence of various 
events, somewhat combined with wishful thinking. Such, for example, 
are statements to the effect that ''it is probable that the weather during 
the next weekend will be magnificent.” 

In the mathematical theory the same word, "probability” is used in a 
strictly defined sense, not necessarily connected with beliefs and certainly 
not with any form of wishful thinking. To avoid misunderstanding, an 
effort is required to draw a sharp line between the everyday meaning of 
"probability” and the contents of the mathematical concept designated 
by this word. 

To facilitate the establishment of appropriate associations between 
probability in the s ense of math ematics and its definition, we will strive 
to use the word "probability” in one connection only: "the probability 
of an object A having the property /i.” It will be denoted by the symbol 
P[B\ A]. Hence we may speak of the "probability of a chair being black,” 
of the "probability of a table being black,” etc. 

The advantage of this way of speaking is that it emphasizes the im¬ 
portant circumstance that the concept of probability is relative to the 
set of objects which "are ^,” i.e., to the set of objects which satisfy the 
definition of "A.” 

The disadvantage of the phrase, "probability of an object A having 
the property B,” is that it is much too long to be used conveniently. 
Therefore, some abbreviations are unavoidable and the abbreviation sug¬ 
gested is "probability of B, given A.” Some other abbreviations are also 
possible. However, whatever the abbreviation used, it is essential to re¬ 
member that it is only an abbreviation. 

Since many errors in computing probabilities are committed because of 
losing sight of the set of objects to which a given probability is meant to 
refer, it is well to give a label to this set. It is called the fundamental prob¬ 
ability set. Thus, the words ^fundamental probability set,^^ or F.P.S. for 
short, are to he used to deeignaie the set S of objects A to which the probabilities 
under consideration refer. The objects A forming the set S will occasionally 
be described as elements of /S. 


15 



16 


PROBABILITY 


[2-2-1] 

2 • 1 • 2. Definition of probability. If the F.P.S. is finite, then the definition 
of probability can be given in very simple terms. On the other hand, if 
the F.P.S. contains infinitely many objects, then the definition of prob¬ 
ability requires the use of some concepts of the theory of functions relating 
to infinite sets. In order to keep this book on an elementary level, the 
definition of probability will be given only for finite fundamental prob¬ 
ability sets. 

Consider a finite set S of objects A to be treated as the fundamental 
probability set. Let n{A) be the number of all objects A in S and let 
n(AB) be the number of those objects A which possess the property B. 
It will always be assumed that n(A) > 0, that is to say, that the F.P.S. 
contains some objects. 

Definition 2-1. T'he probability of an object A possessing the property 
B is simply the ratio 


P{B I A\ 


n(AB) 
n{A) ’ 


Obviously, whatever be the property B, the number n(AB) must be 
between the limits 0 g n(AB) g n(A). Therefore, the probability 
P{B\A} is necessarily a rational number between zero and unity. If 
P{B \ A] =0, then the number n{AB) of objects A having the property 
B is also equal to zero and B is called a property impossible in the particular 
F.P.S. At the other extreme, iiP[B\A} = 1, all the elements of the F.P.S. 
possess the property B and this property is called a sure property in the 
particular F.P.S. It is clear that the same property B may be impossible 
in one F.P.S. and sure in another, etc. 

Upon examining the above definition of probability, it must be clear 
why its extension to infinite F.P.S.’s requires concepts of advanced mathe¬ 
matics. The reason is that, if the F.P.S. is infinite, then the number n{A) 
of its elements has no meaning. Instead, we have to use another number, 
called measure, which, in a certain sense, plays a role similar to the number 
of elements. It may be useful to remark that with an infinite F.P.S., the 
equalities P[Bi\A] = 0 and | A} = 1 do not indicate that the 
property Bi is necessarily impossible or that the property B 2 is necessarily 
a sure property. 

Let us now use the definition given above and compute the probability 
in a few simple examples. 


2*2. Illustrations 

2•2*1. Given an ordinary die, what is the probability, say Pi , of its 
side having six dots on it? 

An ordinary die has six sides. Hence, the F.P.S. is composed of n(A) = 



ILLUSTRATIONS 


[2-2-3] 


17 


6 elements. Only one of the sides has six dots on it. Hence n{AB) = 1 and 
P, = P{six dots I side of die} = 

It will be noticed that in this problem there is no mention of the die 
being fair or biased and the answer is entirely independent of any such 
assumption. 


2 * 2 * 2 . Given an ordinary die, what is the probability, say Pg , of its 
fallfl^ with six dots up? 

A cursory examination of this problem must reveal that (i) the prob¬ 
ability P2 is different from the probability Pi in the preceding problem, 
in tlfe sense that it refers to a different F.P.S., and (ii) that the conditions 
under which the vakie of P2 is to be computed are not specified in a precise 
manner. 

The solution of any problem in probability must be preceded by an 
accurate description of the F.P.S. to which the probability is me^ant to 
refer. In the particular case of the probability P2, it is obvious that the 
F.P.S. is to consist of ^^objects” described as “falls of an ordinary die.” 
This, however, does not sufficiently describe the F.P.S. and, before at¬ 
tempting to compute P2 , we must ask what particular falls are con¬ 
templated. These may be, for example, some 7 i{A) = 100 falls which were 
already observed in an experiment. If so, then to compute Pa it is sufficient 
to count the particular falls in which the die fell with six up and to divide 
the result by 100. 

Alternatively, the F.P.S. may be meant to consist of some n(A) = 100 
falls of the die which are to be observed in some future experiment. If this 
is the exact interpretation of the problem, then until the experiment is 
actually performed, the probability Pa must remain unknown. To deter¬ 
mine the value of P2 , it is necessary to perform the 100 throws, to count 
those which result in “six up,” and to divide the result by 100. 

The above two interpretations of the problem are uninteresting and lead 
to trivial results. The interesting problems of probability are those in 
which the F.P.S. is a hypothetical one and is indirectly described in the 
statement of the problem. The difficulties in solving such problems are 
frequently connected with the looseness of their habitual statement and 
with the necessity of translating them into precise terms. 

Simple as these points are, we insist on them, because of numerous mis¬ 
understandings found in the literature. See, for example, reference [8] at 
the end of Chapter 1 , pp. 300 et seq. 

2 * 2 * 3 . Given a deck of 52 cards, what is the probability, say P3, that a 
hand of 5 cards includes the ace of spades? 

This is again a typical problem of probability with a loose statement of 
conditions. The F.P.S. is obviously composed of “hands of five cards 



18 


PROBABILITY 


[2-2-3] 

each/' but there remains to be guessed what particular hands are to be 
included. The interpretation of the problem which is customary and 
nontrivial is that the F.P.S. consists of all possible hands, of 5 cards 
each, which differ from each other by at least one card. With this inter¬ 
pretation, to compute the probability P3 , it is necessary to establish first 
the number n{A) of ways in which different hands of five cards can be 
selected from the deck of 62 cards. The next step is to obtain n{AB), 
to count those different groups of 5 cards which include the ace of spades. 

Obviously n{A) is the number of combinations* of 5 out of 62 , 


n{A) = Cl, 


52 ! 

5147 ! ' 


An easy method of determining n{AB) is to imagine that one actually is 
given a deck of 52 cards and is required to produce in turn all the n{AB) 
different groups which include the ace of spades. The method of obtaining 
these groups which will probably suggest itself is to take the ace of spades 
out of the deck and hold it in readiness to be joined with some other four 
cards selected from the deck. On realizing this, it must be obvious that 
the number n{AB) is simply the number of ways in which different groups 
of four cards each can be selected from the incomplete deck of 51 cards. 
Thus 

„(AB) = CJ. = 


and 


P{ace of spades | possible hand of 5 cards} = ^ • 

On occasion, a result of the kind above is interpreted to mean that, 
with reasonable shuffling, hands of five cards, including the ace of spades, 
will be dealt with the relative frequency of about 6 in 52 . If the reader 
performs an experiment, he is likely to become convinced that the exacti¬ 
tude of this interpretation depends very much on what is meant by 
‘Reasonable shuffling of cards." With some methods of shuffling which may 
seem perfectly reasonable, the frequencies may appear to be very different 
from 6/52 and the difficulty may be resolved by postulating that only 
such methods of shuffling are to be called “reasonable" or “fair" which 
lead to relative frequencies of hands approximating the probabilities com¬ 
puted as in the present example. What particular methods of shuffling 
satisfy^ these requirements is a matter for empirical study. 

*Fprmulae relating to combinations, permutations, and arrangements are expected * 
to be known to the reader from a course in elementary algebra. However, for con¬ 
venience, these formulae are deduced in Section 2.5 of this chapter, pp. 34-39. 



ILLUSTRATIONS 


19 


[2-2-4] 

2 - 2 -4. The F.P.S. considered in the preceding problem consists of all 
possible groups of five cards which differ from each other by at least one 
card. Such groups may be called unordered groups. In the present para¬ 
graph we will consider a problem relating to groups of cards (or other 
objects) which we will agree to consider as distinct when they differ either 
by at least one card or by the order in which the cards are arranged. The 
class of such groups of objects selected from a given set will be called the 
class of ordered groups. The two groups of five letters 

Cif by Cy dy S 

^ hy dy Cy dy 6 

are considered as identical unordcred groups but as different ordered 
groups. With these definitions, let us compute the probability P 4 that in 
a hand of five cards the second and the fourth cards are aces. 

The fundamental probability set will be taken to consist of all the n(A) 
ordered groups A of five cards which are obtainable from a deck of 52 
cards. Obviously n{A) equals the number of arrangements of 52 elements 
taken five at a time 

/.IN ^5 52! 

n { A ) = ^52 = • 


Now compute the number of those ordered hands which possess the re¬ 
quired property: that the second and fourth cards are aces. 

Following the method suggested in the previous problem, imagine that 
one is required to produce effectively in turn all the ordered groups in 
question. One would start by taking two aces out of the deck, e.g., the 
ace of spades and the ace of hearts, and have them ready to insert into 
the group of five cards with the ace of spades as the second card and the 
ace of hearts as the fourth. 

With this pair of aces, taken in this particular order, it will be possible 
to obtain as many different ordered groups of five cards as there are 
different ways of selecting and ordering three cards out of the 50 cards 
remaining in the deck. This number is 


A 


3 

60 


50! 


47! ■ 


The same number A\q of new ordered five-card groups will be obtained 
with any ordered group of two aces. Since the number of different-ordered 
groups of two aces is .d* , the total number of ordered five-card grq|ips in 
which the second and the fourth cards are sices is 


n{AB) - Alo Al 


50] 4! 
47! 2! ' 



PROBABILITY 


[2-3-1] 


It follows that 

pf2nd and 4th ordered handl _ _4 J_ 

\ card aces of 5 cards / 2! 47! 52! 52 61 ~ 221 

The student will have no difficulty in verifying that the same answer 
would be obtained if instead of the second and fourth cards it were re¬ 
quired that the first and the second are aces. Moreover, exactly the samcj^^ 
reasoning leads to the solutions of the following more general problem. 

Let S be a set of N objects of which n ^ N possess some distinguishing 
property B (B-objects, for short). Let A stand for an ordered group of 
m ^ 2 objects selected from S. Finally, let i and j stand for two different 
integers not exceeding m. * 

Then 

and j*** objects are JS-objects | A} = ^ • 

This formula reads as follows: the probability that an ordered group A 
of m objects selected from S will have B-objects in the P** and the 
places is equal to n{n — 1)/N(N — 1). It may be of some interest that 
this probability does not depend on i, j, or m but only on n and N. This 
result will be used later. 

2’3. Relative or conditional probability 

2-3-1. Definition. The adopted symbol of probability, P{B| A], that 
an object A will have or has the property B, includes reference both to 
the particular property (on the left of the vertical bar) and to the object 
A (on the right of the bar). The reference to the set of objects to which a 
given probability applies is particularly important in all those cases where 
the solution of the problem requires consideration of probabilities of the 
same property B relating to several different fundamental probability sets 
^ 0 , , ^2 , • • • • In such cases one of the sets, say So , usually includes 

all the others. Obviously, in problems of this kind, the omission in the 
symbols of probability of reference to the appropriate fundamental prob¬ 
ability set is likely to cause misunderstanding. To decrease the chance of 
misunderstanding, it is desirable to attach a special label to those prob¬ 
abilities of the property B which refer to one or the other of the subsets 
Si j S 2 , • • • rather than to the all-inclusive fundamental probability set So . 

Let So be the all-inclusive F.P.S. of some objects A and let Si be the 
part of So composed of all the objects A which have the property Bi , so 
th^'IShe combined property ABi serves as the definition of all the elements 
of Si . Further, let B 2 stand for some other property which the objects A 
may or may not possess. If the set Si is not empty, then the probability 
of Ba can be considered alternately with respect to S^ ^d with respect 



FURTHER ILLUSTRATIONS 


21 


[2-4-1] 


to Si . The probability of B 2 referred to the whole set So of objects A is 
denoted by P{fi 2 1 The probability of the same property B 2 referred 
to the set Si is denoted by P{^ 2 1 ABi}. In conformity with the general 
definition of probability, we have 


P{B2 I A) 


nCAPz) 

n(A) ^ 


I 

PfT? \ AR \ 'i^iABiB^ 

where n(Ai?i) and ^(APa) are the numbers of objects A which possess 
properties Bi and B 2 , respectively, and where ^(A/iiP^) is the number 
of those objects A which possess both of the properties Bi and B 2 . 

Obviously, PlB^l ABi] need not equal PjPg I A j, and therefore the 
distinction between the two is most essential. To emphasize this dis¬ 
tinction, PjPa I A/ii} is called the relative or the conditional probability 
of B 2 given By . By contrast, in references to PfJSa | A}, we occasionally 
use the expression “absolute probability.’* 


2-3*2. Example. Let A stand for “ball,” By for “small” and B 2 for 
“black.” Let there be one small black ball, two small white balls and three 
large white balls on the table. Consider these six balls as the all-inclusive 
F.P.S. Then the absolute probability of B 2 is one-sixth and the relative 
probability of B 2 given By is one-third: P{P 2 1 A} = i, P{B 2 1 AZ?i} = 
Similarly, P[By \ A] — J and P{By \ AB 2 ] = 1. 

Upon examining the symbols used, it will be clear that if all the prob¬ 
abilities in a given problem refer either to the whole set of objects A or to 
some of its subsets, distinguished by some properties Pi , Pa, • * • , etc., 
then the omission of reference to A would not cause misunderstandings. 


Thus, 

we will occasionally write simply 




P{B^] 

for 


A} 

and 






P{B, 1 B,\ 

for 

P{B, 

1 ab,], 


etc. Similarly, when discussing the numbers of elements in the funda¬ 
mental probability set, it may be more convenient to write simply n 
instead of n(A), and n(P) instead of n(AP), etc. 

I 

2*4. Further illustrations 

2-4'l. A controversial problem. We will now solve a problem wlilMi was 
the subject of some controversy (see Chapter 1, ref. [8], p. 301, arid ref. 
[16], p. 322). The traditional formulation of the problem is as follows. An 
experiment E is to be performed using a bag and two boxes. The bag 



22 


PROBABILITY 


. [2-4-1] 

contains three numbered balls, two with No. 1 and one with No. 2. The 
boxes are numbered. Box No. 1 contains four balls of which one is black 
and three are white. Box No. 2 contains five balls of which four are black 
and one white. 

The experiment E consists of two consecutive draws of the balls. First 
a ball is drawn from the bag and then another ball is drawn, either from 
box No. 1 or from box No. 2, according to the number on the ball drawn ^ 
from the bag. It is required to compute the probability, say P*, that the 
second draw in the experiment E will yield a black ball. 

As was emphasized already, the correct solution of a problem in prob¬ 
ability requires an unambiguous definition of the fundamental probability 
set. It must be realized that all the details of the present problem referring 
to the bag, the boxes, and the two consecutive draws are no more than 
the somewhat picturesque and indirect descriptions of, or hints on, the 
nature of the F.P.S. contemplated. Therefore, the solution of the problem 
must begin by deciphering these hints and translating them into exact 
terms. 

When trying to decide what the “objects A” forming the F.P.S. are, 
one might perhaps be tempted to consider that they are the nine black 
and white balls contained in the two boxes. Such an assumption, however, 
would entirely ignore the bag and its contents, which are specifically men¬ 
tioned in the problem. The appropriate interpretation seems to be that 
the probability Pi is meant to refer to the final outcome of the experiment 
E composed of two consecutive draws of balls, and to answer the question 
“how frequently will the experiment E end in drawing a black ball?” 

Consider, then, the set So of outcomes of the experiment E or, more 
simply, the set So of pairs of draws as described in the problem, and treat 
this set as the all-inclusive F.P.S. Let N stand for the total number of 
pairs of draws in So • 

Each pair of draws in So is distinguished by a combination of possible 
outcomes of each draw which we may denote by the ssunbols “1” and “2”, 
“6” and “ta”, respectively. Let n(l,b) denote the number of pairs of draws 
in So , of which the first (from the bag) gave ball No. 1 and the second 
(from box No. 1) gave the black ball. Further, let n{i,w), n(2,b) and 
n(2,w) stand for similarly defined numbers of the three other categories 
of pairs of draws in So . Obviously 


Pi = P[h\ pair of draws = A} = 

and difr'problem consists in determining the numbers n(l,&) .^^nd n(2,2>) 
in terms of N. To do so, let us interpret the conditions of the problem in 
relation to the fundamental probability set So . 

The description of the bag and the structure of the experiment E imply 



[2-4-1] FURTHER ILLUSTRATIONS 23 

that in the all-inclusive set So of pairs of draws, the probability of the 
property “1” is 

P{1|^} =g( M) .. ± . 

Similarly 

P{2\A} = 


These two equations exhaust the information involved in the description 
of the bag and we may now turn to interpreting the conditions relating 
to the two boxes. 

It will be noticed that whatever probabilities may be implied by the 
description of box No. 1, these probabilities will refer, not to the whole 
set So of pairs of draws, but only to its subset, say Si , composed of those 
pairs in which the first draw gave ball No. 1. In other words, the de¬ 
scription of box No. 1 applies to relative probabilities of and 
given the property 

From the composition of box No. 1 we have 


P{b\Al} 

Similarly, for box No. 2 


n(l,b) _ 1 ^ 
n(l,6) + n{l,w) 4 


P{b I A2} 


n(2, b) ^ 4 ^ 
n(2,6) + n{2,w) 5 


These two equations, combined with the two previous ones implied by 
the description of the bag, give in turn 


n(l,b) = i (n(l,b) + 11JV = | JV, 


n(2,b) = I (n(2,b) + n(2,w)) = ^ ^ 

Substituting these results into the expression of Ps we finally get the 
answer 

‘ 6 ^ 15 30 

Since both n(l,6) and n(2,b) must be integer numbers, it is obvious 
that N must be divisible both by 6 and by 15. Thus, to satisfy the condi-^ 
tions of the problem, the fundamental probability set So must be composed 
of at least N = 30 pairs of draws. Alternatively, N can be a multiple of 30. 



24 PROBABILITY [2'4’2] 

If iV = 30, then the numbers of different categories of pairs of draws in 
the F.P.S. are as follows. 





Totals 


•<b’’ 

5 

8 

13 


‘V’ 

15 

2 

17 


Totals 

20 

10 

30 



We may now summarize in precise terms the conditions of the problem 
just solved and its solution: if the elements of the fundamental probability 
set So are distinguished by the four combinations of two pairs of con¬ 
trasting properties (1,6), (lyw)] (2,6), (2fW), and if 

P{1}=|, ^"{21=|, 

while 

P{6|lj=i, P{5 12}=|, 


then (i) the number of elements in So must be a multiple of 30 and (ii) the 
probability 




2*4*2. Performance characteristic of rule (i) of Chevalier de M6re. In 

subsection 1-3-4 we considered rule (i) of inductive behavior which 
Chevalier de M6r6 might have adopted when playing with doubtful dice. 
We will now compute the performance characteristic of this rule. 

The conditions of the problem are as follows. In each game de M6rd is 
, allowed to bet either on or against the occurrence of a certain event E 
(at least one '‘double six^' in twenty-four throws of two dice). The prob¬ 
ability P of E is assumed to be the same in all games. Because the value 
of P is unknown, de M4r6, before betting, witnesses three games and 
counts the number X of games which yield E. If X ^ 1, he takes action 
at which is to bet against E in all the fc^owing games. If Z == 2, de M4r4 
takes action Ua , which is to bet on E. Finally, if X = 0 or 3, de M6t6 
considers that his adversary is a crook and takes strong action . The 
^)erformance characteristic of this rule of behavior is composed of three 
probabilities, p(ai | P), p(aa | P), and p(a 3 1 P), that, after observing X, 
de M4rd*will take action Ui, Oa and as, respectively. 

The all-inclusive fundamental probability set So is composed of triplets 






FURTHER ILLUSTRATIONS 


25 


[ 2 * 4 - 2 ] 

of games. Each game has the possible outcomes ^.and notrE. The number 
N of elements in the set So as well as the number of these elements in the 
particular categories into which So is divided are not given directly and 
we have to obtain them from some equations based on the conditions of 
the problem. 

We begin by noticing that all the elements of So can be classified into 
^ eight categories according to whether or not E occurred in the first game, 
in the second game, and in the third. If each of the three letters ajfi and y 
is allowed to stand for zero or unity, then S(a,Pjy) will adequately de¬ 
scribe each of the eight categories within So . Thus, for example, 5(1, 0, 1) 
will stand for the category of triplets in So in which E occurred in the first 
and the third games but not in the second. Let n(a,j3,7) denote the number 
of elements in 5(a,/8,7). With this notation 

I p) ^ n(l,0,0) + n(0a,0) + n(0,0,l) ^ 

(2-4-1) p(a., I P) = +n(l,0,\) + n(l,l,0) ^ 


p(a3 I P) 


w(0,0,0) + n(l,l,l) 
N 


To solve the problem it is sufficient now to use the hypothesis that, no 
matter which of the three games of the triplet we consider, the probability 
of E is the same irrespective of the outcomes of the other games in the 
triplet. 

This hypothesis is interpreted to mean that the relative probability of 
E in any one of the games, given an arbitrary outcome of the other two 
games, is always equal to P. In terms of the symbols n(a,i3,7), this implies 
that 


(2.4.2) P{a = 1 I ^,7} 

whatever be 0 and 7. Similarly 


_ n{l,0,y) 

n(l,/8.7) + n{0,fi,y) 


(2-4.3) P{fi ^ l\a,y} = P[y = l\a,p\ = P. 


From the first of these equations we have 
(2-4-4) n(l,/3,7) = ni0,p,y) 


which, for = 7 = 0, gives 

(2-4-5) n(l,0,0) = n(0,0,0) Y&p * 


i 


1 



26 PROBABILITY [2'4'2] 

It is obvious that the right-hand side is also the value of n( 0 , 1 , 0 ) = 
n(0, 0 , 1 ). Using ( 2 -4'4) and setting jS = 1 and 7 = 0, we get 

(2-4-6) n(l,l,0) = n(0,l,0) = n(0,0,0)(Y^)*. 

Similar operations performed on ( 2 -4*3) give 

(2-4-7) n(0,l,l) = n(l,0,l) = »(0,0,0)(j-^)*. 

Finally, upon setting p = y = 1 and using ( 2 -4-3) we get 

(2-4-8) n(l,l,l) = n(0,0,0)(Y^)'. 

But the sum of all eight numbers n(a,P,y) must equal N. It follows that 

n(0,0.0)|l + 3 Y&p + + (r^) j = 

The expression in curved brackets is the third power of the sum 

1 I P _L_ 

^ 1 - p 1 - p ‘ 

Thus, it follows that 

n( 0 , 0 , 0 ) = (1 - PyN. 

Substituting this value in formulae (2-4-5) through (2-4- 8 ), we obtain 
the values of all the symbols n(a,p,y) in terms of N. On substituting into 
the expressions ( 2 •4- 1 ) of p(o* | P) we obtain 

p(a, 1 P) = 3(1 - P)*P, 

(2-4.9) p(o, I P) = 3(1 - P)P’, 

p(o 3 I P) = (1 - P)* + P*. 

Here P is the arbitrary probability of the event E in each of the three 
games to be witnessed by Chevalier de M4r4! As long as we deal with the 
elementary theory of probability with finite F.P.S.’s, it must necessarily 
be a rational number. However, in the general theory of probability this 
restriction no longer holds and each probability may have irrational as 
well as rational values. 

The graphs in Figure 1 correspond to the three solutions ( 2 - 4 - 9 ), with 
P varying continuously from zero to unity. 

Before concluding, we may ask whether or not the F.P.S. used in this 
solution can be interpreted in terms of situations encountered by Chevalier 
de M4r4. The answer is in the affirmative. In the long gambling experience 



FURTHER ILLUSTRATIONS 


27 


[2-4-2] 

of de M6r6 (and also of other gamblers), we may visualize a number of 
cases where the dice used and also the method of throwing were practically 
the same, corresponding to the same value of P. These cases, then, picked 
out of general gambling experience, could be considered as corresponding 
to the fundamental probability set So . Let, for example, P = .5. Then 
p(a31 P = .5) = .25. The practical meaning of this result can be sum¬ 
marized by stating that if in general human experience there exists a set 
of cases, say Sq , where triplets of games satisfy the conditions stated above 
with P = .5, then in 25 percent of all the instances of set So , rule (i) would 
prescribe action as. 

The method of computing probabilities used in this section is called 
the direct method. The reason for this terminology is that each time one 
begins with the definition of probability as the quotient n(AB)/n(A) and 
proceeds to an effective ennumeration of the elements of the F.P.S. At 
times, this method is the simplest one to use. In other cases it is somewhat 
long and cumbersome. In order to simplify the solution of the more com¬ 
plicated cases, we will introduce a few new concepts and prove several 
simple theorems. 

PROBLEMS AND EXERCISES 

Each of the following problems is conveniently solved by the direct 
method. Since the majority of problems are formulated in the traditional 
vague form, in terms of hands of cards, etc., the student should begin by 
translating the statements into precise terminology and, before all, by 
defining the F.P.S. to which any given probability is meant to refer. If 
more than one F.P.S. is involved, then all of them must be defined in the 
solution. 

1. (a) What is the probability Pi that a card drawn from a full deck 
of 52 cards will be an ace? (b) What is the probability Pg that the third 
card in an ordered hand (see 2-2-3) of 13 cards will be an ace? 

Answer: (a) (b) • 


2 . A set S contains N objects A, of which n ^ N possess a distinctive 
property B (P-objects, for short). The number n is greater than two. 
(a) What is the probability Pi that in an ordered group G composed of 
m > n objects selected from S, the ith and jth elements will be P-objects? 
Here i 7 ^ j and i, j g m. (b) What is the probability P2 that in the ordered 
group G, the ith, the jth, and the kih elements will be P-objects? Here 
i, jf and k are all different and none of them exceeds m. 


Answer: 


(a) 


n(n — 1) 
N{N - 1) 


, (b) 


n(n — l)(n — 2) 
NiN - l)iN - 2) ■ 


3. What is the probability Pi that a hand H of 13 cards out of a full 



28 


PROBABILITY 


[2-4-2] 

deck of 52 cards will contain exactly five spades? One of the four players 
in bridge has a hand Hi (of 13 cards out of a full deck of 52), which includes 
exactly four spades. What is the probability P2 that the hand H2 of his 
partner includes exactly five spades? 

Answer: (a) , (b) • 

4. (a) What is the probability Pi that a 13-card hand in a bridge game 
will contain at least seven cards of any one suit, of which seven cards 
form a ‘‘royal flush”: ace, king, queen, etc., and down to eight? (b) What 
is the probability P 2 that a hand in a bridge game will contain exactly 
seven cards of any one suit forming a royal flush? (c) One of the bridge 
players has eight spades, of which seven form a royal flush, two small 
hearts, and three small diamonds. What is the probability P 3 that his 
partner has a seven-card royal flush in any one suit? 

/t.s4CL /x3C;2 

Ammefr: (a) , (b) -p^ir , (c) -r^ir • 

'-'62 ^39 

5. (a) What is the probability Pi that a 13-card hand H will include 
two royal flushes of five cards each, the remaining three cards being 
arbitrary? (b) What is the probability Pa that the hand H will contain 
exactly one royal flush of five cards in any one suit? 

. ..CIC\2 /ux JC4% 3C5al 

Aytswer , (a) ^13 , (b) 4 ^13 ^13 

'^62 L ^63 ^62 J 

6 . A rooming house has n = 6 single rooms. For these rooms there 
are iV = 10 applicants of which Ni = 6 are men and N 2 = 4 are women. 
The rooming house follows the policy of “first come first served.” (a) What 
is the probability Pi that all the six male applicants and none of the 
females will get accommodations? (b) What is the probability Pj that 
four males and two females will get accommodations? (c) What is the 
probability Ps that at least two of the four females will get accommoda¬ 
tions? 

Anmer : (a) ^ » (b) |, (c) || • - 

7. In a game of poker 1 hold the 6 , 7 and 9 of spades and the J and 3 of 

clubs. I do not know what my opponents hold. I discard the two clubs and 
draw two new cards hoping to get a run of five consecutive spades. What is 
the probability of success? Answer: .00185. 

8 . Bill and Jasper play a game. Jasper will throw five dice. If three or 
mor^ of them do not turn up the same face then Bill wins 10 cents. Otheiv 
wise, Bill loses 40 cents. What is the probability that Bill will win? 

9. (H. Scheff4) A club has 100 members. Among them there are 50 



[2-4-2] PROBLEMS AND EXERCISES 29 

lawyers and 50 liars. The number of members who are neither lawyers nor 
liars is 20. A committee of five members is chosen by lot. 

(a) What is the probability that the committee contains exactly three 

lawyers? Answer: .319. 

(b) Exactly three liars? 

(c) Exactly three lawyers who are liars? 

(d) At least three lawyers who are liars? 

10. Four cards, marked 1, 2, 3, 4, respectively, are shuffled well and then 
dealt, one to each of four places marked 1, 2, 3, 4. What is the probability 
that none of the cards occupies the place corresponding to its number? 

Answer: .375. 

11. I wish to pay the 25c toll on the Bridge. I shall take three coins from 

my purse at random. The purse contains 1 quarter, 2 dimes, 3 nickels and 
1 penny. What is the probability that I shall pull out less than enough to 
pay the toll? Answer: .486. 

12. A case of 100 toy autos contains 10 defective autos. If five are se¬ 
lected at random and shipped to a store, what is the probability that the 
store will receive at least one defective toy? 

13. Peter offers Paul the following game. Peter will fill a bag with n 
balls, some red and the rest green, in a proportion which Peter will select 
by himself. The game will consist of drawing a ball out of the bag and Paul 
may bet on either of two colors. If he decides to bet on red, this will be 
described as action Ui , If he bets on green, this will be described as action 
Uj . Before the actual betting, Paul is allowed to draw once or twice from 
the bag and to record the colors observed. In either case, the extracted ball 
is replaced in the bag before the next draw and the bag is well shaken. Paul 
considers the following two rules of inductive behavior. 

(i) The first rule is based on the assumption that before the actual bet¬ 
ting Paul will draw just one ball from the bag. If the ball drawn is red then 
Paul wilHbet on red in the next draw. Otherwise PauPs bet will be on green. 

(ii) The second rule is based on the assumption that before betting Paul 
will draw twice from the bag. Denoting by rr, rgr, gr and gg the four possible 
outcomes of the two draws, according to the colors of the balls drawn, rule 
(ii) is: 

if the two draws give rr then take action Qx , 

if the two draws give rg then take action Oi , 

if the two draws give gr then take action Uj, 

if the two draws give gg then take action 02 . 

(a) Denote by p the proportion of red balls in the bag and obtain the 
formulae giving the performance characteristics of the two rules of inductive 
behavior (i) and (ii). Substitute p == 0, .1, .2, • • • , .8, .9 and 1 and make a 
graph. Which of the two rules is preferable to Paul? Why? 



30 PROBABILITY [2*4*2] 

(b) Assume that Peter knows that Paul intends to use rule (i) and that he 
decides that the total number of balls in the bag shall be six. How many red 
balls should he put in the bag in order to have the best chance of winning 
the game? What is the probability of Peter^s winning if he puts 2 red and 4 
green balls in the bag? 

(c) Assume that Paul decides to use rule (iii) of inductive behavior defined 
as follows: 

take action Ui when the first two draws give the result rr, 

take action Ua otherwise. 

Obtain the formulae for the performance characteristic of rule (iii) and 
make a plot. What is the proportion, say po , of red balls in the bag which 
insures the greatest probability of Peter’s winning the game? What is the 
value of this probability? Partial Answer: po = .61 (approximately). 

14. The contract between a manufacturer of tires and a consumer pro¬ 
vides that, out of each lot of 100 tires, two tires will be selected at random 
and subjected to a test. The test is destructive so that a tire which has been 
tested has no market value. In negotiations for the contract the following 
three rules (of inductive behavior) are considered. 

Rule (i). If both of the two tires tested are defective then the whole lot 
will be rejected by the consumer. Otherwise the lot will be accepted by the 
consumer. 

Rule (it). If both of the two tires tested are good then the lot will be 
accepted by the consumer. Otherwise it will be rejected. 

Rule (iii). If both of the two tires tested are defective, the lot will be 
rejected. If both of the two tires tested are good, the lot will be accepted. If 
one of the two tires is good and the other defective, then a third tire will be 
selected from the 98 tires remaining in the lot and the whole lot will be 
accepted or rejected according to whether or not this third tire is satisfac¬ 
tory. 

(a) Denote by D the number of defective tires in the lot and obtain formu¬ 
lae representing the performance characteristics of the three rules consid¬ 
ered. Substitute Z) = 0, 10, 20, 30, 40, 60 and make graphs. Which of the 
three rules is most satisfactory to the consumer? Which of the rules is most 
favorable to the manufacturer? Why? If you were the consumer, would you 
consider any of the above rules acceptable? 

(b) Perform a sampling experiment imitating the acceptance procedure 
of tires described above. Place in a box 100 tags of which either 0 or 10, • • • , 
or 50 are red and represent defective tires. Draw simultaneously 2 tags from 
the bag and, according to rule (ii), consider the lot of tires accepted if neither 
of the tags drawn is red. Otherwise the lot will be rejected. Repeat this 
experiment ten times (imitate the testing of ten lots of tires all having the 
same number of defectives). Compare your frequency of accepting the lot 



[2’5'1] SUMMATION SYMBOL 31 

with the corresponding probability shown on the plot of the performance 
characteristic. 

(c) Perform a similar sampling experiment corresponding to rule 3. 

15. The Nogood Novelty Mfg. Co. wants to advertise that its egg beaters 
are approved by V Magazine, In order to obtain its approval, the magazine 
requires that the Nogood Co. submit a lot of ten egg beaters to the Y 
Magazine Institute. Y will choose at random and test two of these ten 
beaters, and if these two are satisfactory, will give its endorsement. 

(a) If five of the ten egg beaters are defective, what is the probability that 
the Nogood Egg Beaters will be approved by Y Magazine! Answer: .22. 

(b) Let m equal the number of defective beaters in the lot of ten. Plot the 
performance characteristic of the rule of inductive behavior above. How 
large may m be and yet have the probability of endorsement by the Y 
Magazine at least equal to .5? 

(c) Comment on this rule of inductive behavior. Is the test required by 
Y Magazine stringent enough to warrant advertising its endorsement? 

2*5. Certain symbols and some formulae 
from elementary algebra 

2 • 5 • Summation symbol. When discussing sums of a considerable num¬ 
ber of terms, we are frequently embarrassed by the necessity for writing 
all these terms. Consider, for example, the following formula 

1 + 2 + 3+ ••• +( 71 — l)+7i = \n{ri + 1) 

which gives the sum of the first n integers beginning with unity. A similar 
formula for the sum of squares is 

1 + 4 + 9 + • • • + (n — 1)^ + = \n{n + l)(2n + 1). 

Imagine, further, that we have to perform some operations on such sums. 
It is obvious that the necessity for writing a number of terms separated 
by dots meaning “etc.” is inconvenient, especially if, on occasion, the value 
of n may be 1 or 2 so that, in reality, the sum under consideration contains 
fewer terms than are written. To avoid such inconveniences a special 
symbol is adopted to denote the sum of an arbitrary number of terms all 
of which are of the same type. Let T* represent the kth term of a given 
sum and let the sum involve all such terms beginning with the mth and 
ending with the nth with m ^ n. Then the symbol in question is 

i; n = + • • • + + r.. 

The main part of the symbol is the Greek letter capital sigma, 
which stands for the word “sum^ or “summation.” The whole formula 
reads: “sum (or summation) of terms T* , from k = m to n.” Of course, 



32 ALGEBRAIC FORMULAE [2*5*1] 

there is nothing particular about the letter k, used as the subscript; the 

n n 

two symbols ^ Ti and ^ Tt mean exactly the same sum. 

t»m Jfc-w 

Using'the summation symbol, the two formulae written at the beginning 
of this subsection may be rewi^ten as 

Z* = in(n+ 1) 

ib-1 ^ 

and 

2 A* = |n(n + l)(2n + 1). 

In the first case, the typical fcth term is k itself. In the second case it 
is the square of k. Instead of being interested in the sum of integers from 
unity to n, we may be interested in this sum from m to n. This is easily 
written as 

2 * = + 1) - I 


Occasionally, circumstances require consideration of sums of terms which 
can be written conveniently using not one but two subscripts, say i andj. 
Such, for example, is the case of the sum of the first m powers, beginning 
with the first power, of all the integers from 2 to n. Here the typical term 
can be written as t* or s\ etc. In order to symbolize the sum of all such 
terms with i (or s) running through all integers from unity to n and with 
j running through all integers from 2 to m, we have to use not one but 
^wo letters sigma, as follows: 

I n m 

E E 

t-2 i-l 


The reader will perceive that this formula does not involve any new 
concept but only the repeated application of the summation symbol 

n 

already introduced. In fact, applying the definition first to X) then 

m •-*2 

to J), the above double sum can be written as follows 

i-l 


n 



Ei' 

i-l 


E2' + E3' + '••• + En' 

i-l i-l i-l 


= 2 + 2 *+ 2 *+*** + 2 “"‘ + 2 “ 


+ 3 + 3 “ + 3 * + • • • + 3 “-‘ + 3 ” 


—|— • • • • • 

+ n + n* + n* + • • • + + n"'. 



FACTORIAL 


33 


[2-5-3] 

It is seen that the compact formula on the left-hand side describing 
exactly the sum considered has a definite advantage over the bulky de¬ 
tailed expression displayed on the extreme right. The reader will notice 
that interchanging the two summation signs in the above formula leaves 
its numerical value unaltered. In fact, such an interchange will result only 
in a change in the order in which the particular terms are added, 

ft n n n 

j-l t -2 t -2 *-2 <»2 


= 2 +3 + ••• +n 


+ 2^ +3^ + • • • + 


+ 2“ + 3”* + • • • + n*". 

2*5-2. Multiplication S 3 nnbol. Whatever was said above about the in¬ 
convenience in handling sums of either many or an indefinite number of 
terms, applies equally to products. Therefore, for the same reasons, there 
is a generally accepted symbol to denote the product of several factors all 
having the same form for which a general formula is easily written. This 
symbol is built up on the same principle as the summation symbol with 
the Greek letter 11, capital pi, taking the place of capital sigma. Thus, 
whatever may be the definition of , the product of several factors of 
this type is written as 

n T* = • • • r»-ir„. 

k'^m 

If the reader has mastered the use of the summation symbol, then the 
handling of the multiplication symbol presents no difficulty. 

2-5*3. Factorial. Among the products of similar factors there is one 
which is particularly important and for which a special symbol has been 
adopted. This is the product of all consecutive integer numbers beginning 
with unity and ending with any number n or, in the above notation, 

n 

17 A:. A product of this kind is called factorial” and is denoted by n!. 

fc-i 

Thus, for any positive integer n, the symbol n\ is defined by the formula 

n! = n *• 

ib-l 

It follows that 1! - 1, 2! = 2, 3! = 6, 4! = 24, etc. Factorials intervene 
in many formulae in all branches of mathematics. By analyzing these 



34 ALGEBRAIC FORMULAE [2*5*4] 

formulae it was found convenient to extend the meaning of n I to the case 
when n = 0. Namely, the convention accepted is that 01 = 1. 

Definition 2 -2. The term n factorial and the symbol n! are used to 
denote a number depending on n as follows: if n = 0, then 01 = 1; if n is a 
positine integer, then 


nl =11*. 


Jk-1 


2*5*4. Arrangements^ permutations, and combinations. Consider a set 
S composed of n ^ 1 different objects 

) ^2 > ' * * I ^n-1 } • 

Definition 2*3. The term ordered group or arrangement of k objects 
out of the n objects forming S is used to denote a group of k arbitrary objects 
selected from S and placed in a specified ordery so that one particular object 
of the group is designaied os the first, another as the second, still another as 
the third, etc- 

Thus (fli , Ua) and (ui , aj) as well as (ua , a,) and (an, a,) are different 
arrangements of two objects selected out of the set S- The reader^s at¬ 
tention is called to the fact that two groups of k pbjects out of n are con¬ 
sidered as different arrangements when they differ either by at least one 
object present in one group and absent in the other, or by the order in 
which the objects are arranged or both. One of the problems treated in this 
section is to deduce a formula giving the total number of all different 
arrangements of k out of the given n objects. In order to illustrate this 
problem, as well as the meaning of the term ‘‘arrangements,^’ we shall 
enumerate directly all the different arrangements of Jk = 2 out of n = 3 
objects. 

For this purpose we shall imagine some n = 3 objects, e.g., three pencils, 
black, green, and red (B, G, and R, for short), lying on a table and will 
visualize all the different ways (i) of selecting fc = 2 of these pencils and 
(ii) of ordering the selected group. It is easy to see that there are three 
different methods of selecting two pencils out of the given three. These 
are: 

select B and G and leave out R, 

select B and R and leave out G, 

select G and R and leave out B. 

Further, it is also obvious that each group of two objects can be ordered 
in tw6 different ways only. Thus all different arrangements of two pencils 
out of three are 



[2-5-4] 


ARRANGEMENTS 


35 


(M), (/2,G). 

The symbol adopted for the number of all different arrangements of k 
objects out of n is i4n . The result just obtained may be written as Al = 6 . 

Definition 2*4. The term permutation of n objects is used to designate 
a specified order in which these objects are placed with a particular object 
designated as the firsts another as the second^ etc. 

For example, the three pencils, B, G, /i, just discussed can be placed in 
the following six different orders 

BUG BGR 

RBG RGB 

GBR GRB. 

The symbol adopted to denote the number of all different permutations 
of n objects is Pn • From the above example it follows that P 3 = 6 . 

Definition 2-5. The term unordered group or combination of k out 
of the n objects forming the set S is used to denote a group of k arbitrary objects 
taken from #S, irrespective of the order in which these objects may be arranged 
within the group. 

It follows from this definition that two groups, each containing the same 
number of objects, are considered as different combinations only if these 
groups differ by at least one object present in one of the groups and absent 
in the other. Thus, returning to the example of the three pencils, it will 
be noticed that (P,G) and (G,P) are two different arrangements of two out 
of three pencils but represent the same unordered group {the same combina-- 
tion) of two pencils out of the given three. 

There are two alternative symbols used to denote the number of all 
different combinations of k objects out of the given n. One symbol is 
Cn and the other is (Z). In this book the symbol C* , borrowed from the 
French school of mathematics, is used consistently. Returning once more 
to the example with the three pencils, it is easy to see that C\ — Z. 

Remark. Two of the symbols introduced, and , each depend on 
two numbers: n, the total number of objects in the set 5, which is written 
as a subscript at the bottom of each symbol, and ky the number of objects 
within the group, written as a superscript at the head of the symbol. In 
order to remember which of the two numbers is given at the bottom and 
which at the head of the symbols, it is convenient to think of the process 
of forming the arrangements or the combinations as lifting k objects out 
of the given n lying on the table. Hence, the number n of all objects given 



36 ALGEBRAIC FORMULAE [2*5-5] 

and lying on the table is written in the S 3 anbols below the number k of 
those lifted to form the group. Incidentally, visualizing the groups formed 
by lifting several objects is useful in deducing various formulae discussed 
below and, on occasion, we shall use this manner of speaking. 

2*5*5. Formula for the number of arrangements. The deduction of the 
formula for Aj is based on two easy lemmas. 

Lemma 2*1. Whatever the integer number n ^ 1, the number A\ of 
different arrangements of one object out of the given n is equal to n, so that 
Al = n. 

This lemma is obvious because the number Ai is simply the total number 
of different ways in which one object can be selected out of the n objects 
considered. The object selected may be either the first or the second or 
the third, etc. • • • or the nth. Thus, Ai = n. 

Lemma 2*2. Whatever fcc n ^ 2, and whatever k between the limits 
\ < k ^ n^ the number Aj of all different arrangements of k objects out of 
n is equal to the number Ai"^ multiplied by n — {k — 1), 

Ai = A^^n - (fc ~ 1)] = A^-\n -k+l). 

In order to prove this lemma, denote by S the set of n given objects 
Ui, Ua, • • • , On and by Sk-\ the set of all different arrangements of fc — 1 < n 
objects selected out of S. Let a be one such arrangement and consider 
the following method of using it to generate the arrangements of k objects. 
This method will be called the method of addition. 

Since A; — 1 < n, after lifting from the table the objects forming a, 
there will be several, namely n — (fc — 1), objects left. In order to use a 
to create arrangements of k objects, we shall select one object not belonging 
to a and add it to the end of a. It is obvious that the total number of 
different arrangements of k objects each obtained in this manner out of 
the same arrangement a is equal to the number of objects belonging to S 
but not belonging to a, i.e., [w — (A; — 1)]. To illustrate this point, imagine 
that the arrangement a consists of the first fc — 1 objects of S placed in 
their original order, so that 

a = (Ui j a2 y • • • , 

then the different arrangements of k objects obtained from a by the method 
of addition are, say, 

^i(a) = a Ofc , 

^(a) = a Ufc+i , 


^n-<*-i)(a) = a On . 



ARRANGEMENTS 


37 


[2-5-5] 

The reasoning just applied to a may be repeated and applied to any 
other arrangement a' of A; — 1 objects out of the n given, and it will be 
seen that the method of addition will yield the same number n — (fc — 1) 
of different arrangements of k objects. Denote by the mth arrange¬ 
ment obtained from a'. It is obvious that if a and a' are different, then 
every arrangement /3„,(a) will be different from every arrangement 
In fact, whatever m and r, there will be some difference between )3«(a) 
and Price') due to the difference between a and a'. 

Let Sk denote the total set of different arrangements of k objects out 
of the given n which can be generated by the method of addition using 
all the different arrangements of k — 1 objects. Obviously, the total 
number of arrangements forming S* is [n — (fe — 1)], which is exactly 
the value of asserted by Lemma 2*2. However, the method of addition 
just described is one of the many possible methods of using arrangements 
of fc — 1 objects to generate those of k objects, and it may be doubted 
whether or not Su contains all the different arrangements of k out of the n 
objects given. Thus, in order to prove Lemma 2*2, we have to show that, 
whatever the arrangements P of k objects selected out of S, it is necessarily 
included in the set Sit . For this purpose consider the first A; — 1 elements 
of p. 

Considered in the order in which they appear in /3, these A; — 1 objects 
form an arrangement, say a", of i — 1 out of the n objects of S, Thus P 
is obtained by adding to a" one of the remaining n — (fc — 1) objects. 
Since a" must belong to Sk^\ by definition of this set, it follows that P 
must be generated in the process of forming S*. This proves the assertion 
that = At''\n - A; + 1). 

Theorem 2-1. Whatever the numbers n ^ A; ^ 1, 

A- _«!_ . 

" (n - k)\ 

The proof of this theorem consists in using Lemma 2 • 2 and writing 
At = - * + 1), 

At~" = At~\n - fc + 2), 

At-^ = At-\n -k + 3), 


At = At (n - 2), 
At = At (n - 1). 



38 AI.GEBRAIC FORMULAE [ 2 ' 5 ’ 7 ] 

Multiplying these formulae, canceling equal terms on both sides, and 
using Lemma 2-1, we find 

A* = Al{n - l)(n - 2) •. • (n - A + 1) 


_ 

“ (n - A:)!’ 

which proves the theorem. » 

Remark. The reader will notice that, in order that the above formula 
for A* can be written for all values of k beginning with unity and ending 
with n, we need the convention that 01 = 1. 

2-5-6. Fonnula for the number of permutations. According to Defini¬ 
tion 2-4, the number P, of all different permutations of n given objects is 
the total number of different ways in which these objects can be ordered 
in a single row. Obviously, this number P, may be obtained from the 
formula for Aj by substituting A: = n. In fact, Aj designates the number 
of different ways in which k objects may be selected out of the given n 
and ordered. However, there is just one way of selecting n objects out of 
the given n and, therefore, A^ is, in effect, the total number of different 
ways in which the given n objects can be ordered. Thus 



2-5-7. Formula for the number of combinations. The formula for C* is 
easily obtained by noticing its relation to Aj and P* . Whatever k - 
1, 2, • • • , n, the symbol Cj means the number of different ways in which 
one can select k objects out of the given n. The S3Tnbol P* means the num¬ 
ber of different ways in which k objects can be wdered. Finally, Aj means 
the number of different ways in which (i) k objects can be selected out of 
the given n and (ii) then ordered. Obviously, 

Ai = CUP, 

and it follows that 

P* _ _ Wl 

• Pt “ (n - A:)!A:! ’ 

Strictly speaking, the symbol C* has been defined forn ^ 1 and for 
0 < k g n. However, it is convenient to extend its meaning to A: = 0. 
Thus Cj means the number of different ways in which one can select zero 
objects out of the given n. We shall interpret this to mean = 1. It 
will be'seen that, with the convention Of = 1, the formula just deduced 
also gives the correct answer when A; = 0. 



[2'5*8] BINOMIAL FORMULA 39 

Remark, The reader will notice that the formula for Cj is symmetric 
with respect to k and n — fc. It follows that 

C; = CT\ 

so that the number of different ways of selecting k objects out of the 
given n is equal to the number of different ways of selecting n — A; objects. 
This circumstance should not be surprising because while lifting some k 
objects we leave n — k objects lyingthe table. Thus to each unordered 
group of k objects there corresponds ono and one only well-defined un¬ 
ordered group of n — fc objects. 

2-5- 8. Newton’s binomial expansion. In this subsection we shall use 
some of the concepts just discussed in order to deduce the famous bi¬ 
nomial expansion due to Sir Isaac Newton. 

Theorem 2-2. Whatever the numbers a and b and whatever the integer 
number n > 0, 

iV(n) = (a + 6)“ = E 

A-O 

= a + na b -\ -^® o + • • • 

In order to prove the theorem we shall consider n pairs of different 
quantities (oi , 6i), (aa , 62), • • • , (a« , bn) and discuss the product of the 
n factors, say 

Zn = U ~ (^1 + &l)(®2 + W • • * (ctn-l + &n-l)(ctn + &„). 

• -1 

Obviously (a + 5)" can be obtained from Xn by substituting a, = a and 
bi = b, for t = 1, 2, • • • , n. 

We shall establish the following simple propositions regarding the 
product Xn . 

(i) If all the multiplications are carried out, then X„ has the form of a 
sum containing a number of terms of the type, say, 

n 

Tft = oci • aa •••«*•••«» = n 

*-l 

and no others. 

(ii) In the above expansion for Tn , ai is either Ui or bi , aa is either 
Ca or fea, etc., and in general a* is either a* or 6*, for A; = 1,2, • - * , n, 

(iii) One of the terms in the expansion of X„ , and one only, is equal 



40 ALGEBRAIC FORMULAE [2*5*8] 

n 

to the product • Similarly, the expansion of contains exactly 

t-1 n 

one term of the form Yl • 

t-1 

In order to formulate the fourth property of Xn which we shall need, 
let s be a positive number less than n and let F, denote an arbitrary com¬ 
bination of s numbers selected out of 1, 2, • • • , n. Write this combination as 

r. = (mi ,ifna, ••• , m.) 

and let the remaining numbers of the set 1, 2, • • • , n be denoted by 

m.+i , m.+a, ••• , m^ . 

(iv) Whatever the integer number s between the limits 0 < s < n, 
and whatever the combination F. , the expansion of Xn contains the 
term, say, 

^.(r.) = n n K, 

,*-i ,'-.+1 

and there is just one such term in the expansion of X, . 

In order to illustrate proposition (iv), let n = 8 and s = 3. Then r, 
stands for an arbitrary combination of three numbers taken from the 
eight numbers 1, 2, 3, 4, 5, 6, 7, 8. For example T, may be, say, 

n = (1.2,3) 

so that till = 1, trii = 2, m 3 = S, mt = 4, etc. • • • and to* = 8. On the 
other hand T. may be, say, 

rr = (2,4,6) 

so that Ml — 2, m 3 — 4, m 3 = 6, = 1, m* = 3, m» = 5, mr = 7 and 

m 3 — 8. In fact. Fa stands for an arbitrary combination of three numbers 
out of 1, 2, • • • , 8. 

Proposition (iv) asserts then that, whatever Fa , the expansion of X 3 
contains one term, and one term only, equal to 

• 3 8 

n rr ^mi • 

*-i 1-4 

Thus, this expansion contains exactly one term of the form 
r8(F3) — OiOtaUahahahahTha • 

Also, it contains exactly one term of the form 

TsfYi'} — bia2ha04haOa&7hg , 


etc. 

Before proceeding to the proof of the four propositions, we shall show 
how they imply Newton’s formula. Setting Oj = o and 6.- = b for i — 



[2-5-8] BINOMIAL FORMULA 41 

1,2, • • • , n, and using propositions (i) and (ii), we see that the expansion 
of N(n) is a sum of terms, each of which has the form 

According to (iii) this expansion contains the terms a" and fe", both 
with coefficients equal to unity. According to (iv), whatever be the positive 
number s < w, the expansion of N(n) contains a term with the product 
a*ir~\ Irrespective of the particular combination F, , each term r„(r,) 
reduces to when each a,- is equated to a and each 6, is equated to 6. 
Furthermore, it is obvious that the coefficient of in N{n) is equal 
to the number of different combinations F, of s numbers selected out of 
the given n numbers 1, 2, • • • , n. That is, thp coefficient of in the 
expansion of N{n) is , as asserted by the Theorem. 

It follows that, in order to prove Newton^s binomial formula, it is suffi¬ 
cient to prove the four propositions (i) through (iv). It will be convenient 
to begin by writing down the expression for for the first few values of 
n and by making a convention about the order in which the particular 
terms are written. Thus 

= Ui + , 


X 2 = Xi(a2 "l~ ^^ 2 ) — Xia2 "f" Xib2 


— (ii(i2 “i" biCi2 “t” 4" j 

etc., and in general, 

Xk == Xk-i(cik "f* bk) = Xk-iCik 4" Xk^ibk 
for fc = 2, 3, • • • , n. 

It is seen that each term of X„ is the result of multiplying n factors of 
the type a* or b^ , which establishes proposition (i). 

In order to establish (ii) we notice that X„-i is independent of Un and 
bn and that, since Xn — Xn-Mn 4- fe») = Xn-ian 4- Xn-ibn , each term 
Tn must possess one factor, either Un or 6„ , and cannot possess both of 
them. This conclusion regarding the last pair of numbers an , bn is imme¬ 
diately extended to any other a* , 6* because the product is independent 
of the order of the factors and the expression of Xn can be modified to 
put the factor (Uk 4- 6*) at the end. 

In order to establish (iii) and (iv) it is useful to make a convention 
about the order in which particular terms and particular factors are 
written in the process of repeated multiplication leading to Xn • Thus 
we shall agree that, when multiplying X* by (a*+i 4- 6*^) in order to 
obtain Xk+i , we shall write the terms of Xkak+i first and the terms of 
Xibk+i second. Furthermore, the order of terms within each such product 
will be the previously established order of terms in X* . Applying this to 
A; = 1, 2, • • • , n — 1, the order of terms in X^ will be uniquely determined. 

As to the order of factors in each term, we shall always begin with 



42 


ALGEBRAIC FORMULAE 


[2-5-8] 

either ai or 6i , follow it by da or 63 etc. With these two conventions, it 
is easily seen that, whatever fc == 1, 2, • • • , n, the first and the last terms 

k k 

in Xk are 17 > respectively, which establishes the first part 

»-1 t -1 

of proposition (iii). In order to establish the second part, namely that 

fi n 

Xn cannot contain more than one term of the type either Yl or IT > 

ib i -1 1-1 

notice that the number of terms ]"[ a, in Xk must be equal to the number 

*-i *-i 

of terms of the type Yl ^*-1 for A; = 2, 3, • • • , n, because Xk = 

1-1 

Xk^iak + Xk-ibk . Applying this remark in turn to , Xn-i , • • • , X* 

n 

and Xa , we come to the conclusion that the number of terms 77 in 
Xn is equal to the number of terms of the type Ui in Xi = Oj + bj . It 

n 

follows that there is just one term 77 in X„ . The same reasoning 

n i “1 

applies to 71 which establishes proposition (iii). 

»■ -1 

Proposition (iv) is established similarly. We begin by considering a 
particular combination, say 

r: = (1,2, ,s) 

composed of the first s members of the set 1, 2, • • • , n, and show that 

• n 

Xn must contain the term 77 17 that there is just one such 

*-l i-« + l 

term in the expansion of Xn . 

a 

According to what is already established, X, contains the term 71 

*•-1 

and one such term only. As a result of multiplication by (o.+i + 6.+,) 

a 

this term will generate just one term 71 ^iba+i in the expression for 
X.+i . Repeating the same reasoning and applying it to Z.+j , Z.+a , 

* n 

• • • , Xn , we conclude that Xn contains exactly one term 71 71 bf . 

t -1 j -«+1 

Now let r, be an arbitrary combination of s numbers out of 1, 2, • • • , n. 
As formerly, we write F, as 

r. = (fft, , m*, • • • , m.) 

and let the remaining n — s numbers be m.+,, m,+* ,•••,?»,. Changing 
the order of factors in Xn we write its expression as . 

Xn=U + K.) 

, f-1 

BO that the first factor is now a„, + b„, , the second , etc. 



PROBLEMS AND EXERCISES 


43 


[2-5’8] 

Naturally, this change of order of factors does not affect the value of 
Xn . Upon applying the result just obtained to the new expansion (which 
is really the old one written in a different order), we conclude that the 
expression for X„ contains exactly one term equal to 

• n 

Ha-i n bm, 

*-l l-a + 1 

which completes the proof of proposition (iv). Thus the formula of Newton 
is established. 

PROBLEMS AND EXERCISES 

1. Evaluate the following sums 

R 3 4 3 4 3 4 

S 2 S y. ^ S 2 f‘ 

i-ll-2 1=1 I“1+1 i-lj-i+1 

2. Show that, whatever the meaning of the term a<, , 

tiaa= EEa... 

1-1 j-I j«l i-l 

1 3. Prove that 

£ £ ^ ®'A S hj- 

1-1 ,-l \ 1-1 /\ i-1 / 

* 4. Show that 

n—1 n n j-l 

£ fli< = £ £ o.i • 

1-1 i-i+l 1-2 t-1 

5. Let = 1 - I • Show that 

n n r* = 1. 

fc-2 

6. Show that 

(na,)(n«,)- n.:. 

7. Show that 

n 1 = 2" n i2k -1) n m. 

i-l ifc-l m-l 

8. Compute the quotient 7 !/5!. Answer: 42. 

9. Compute the quotient 

_ _6! 

3 ! 2 * 


Ansiocr: 15. 



44 


FUNDAMENTAL THEOREMS 


[ 2 - 61 ] 


10. Show that, for every positive integer n 

= 1-3-5 ••• (2n - 3)(2» - 1). 

What is the value of aio ? 

11. Consider a set of n = 10 objects. In how many different ways can 
one (a) order these objects? (b) select unordered groups of three objects 
each? (c) select ordered groups of four objects each? 

12. Compute A\ , P, and Cj. Answer: 360, 120, 20. 

13. Verify Newton’s binomial formula putting a = 1, 6 = 2, n = 4. 

14. Use Newton’s binomial formula in order to prove the following 
identities: 


/ V ^ _1_ _ ^ 

- A)! “ n! 

(b) E c: = 2- 

4-0 

(c) i:(-i)*c'; = o 

4-0 

(d) Z Ctc:zt = 2*c; (Hint: 2=1 + 1). 

4-0 

15. Let m ^ n, ^ rij be positive integer numbers. Use the definition of 
the symbol Cj in order to prove the identity 

m 

C »» _ /nr* 

nt+ni • 

4-0 

Verify this identity numerically putting m = 3, n, = 4 and n, = 5, 

16. The identity given in problem 16 expresses CZ+n. in terms of CZ 
and err* when m ^ n, ^ . Deduce a similar identity which is valid 

in case < m ^ Ut. 

17. Deduce an identity similar to that discussed in problem 16 which 
is valid when ni ^ rij < m ^ ni + n*. 

2’6. Fundamental theorems. 

2-6-1. Some definitions and operations relating to properties. The pur¬ 
pose of the theorems proved in this section is to facilitate the computation 
of probabilities relating to certain typical combinations of properties. The 
study of these theorems will be preceded by the study of some connections 
which may exist between two or more properties. For this purpose we 
introduce some convenient terminology and notation. 



OPERATIONS ON PROPERTIES 


45 


[2-6-1] 

Negation, If B stands for a property, then the symbol B will be used 
to denote the ^‘negation of or ‘^non-fi^^ which is the property consisting 
of failing to possess B, Thus, if B means ‘‘black,then 5 means “not- 
black^' or “any color but black.’^ Obviousl y, th e negation of the negation 
of any property B is the property B itself: (B) = B, 

Logical sum. Let -Bi , , • • • , be any properties. The term “logical 

sum^' or simply “sum” of these properties will be used to designate the 
property, say B', which consists of possessing at least one of the properties 
“added,” i.e., at least one of the properties Bi , Bj , • • • , B^ . The logical 
sum of properties is denoted by using the sign + or the summation sign 2 ^, 

B' = Bi+B,+ +B,= EB,. 

»■ -1 

If Bi and B 2 stand respectively for “black” and “small,” then B' = 
Bi + B 2 denotes the property consisting of being either “black” or “small” 
or “black and small.” 

Logical product. Let Bi , Bj , • • • , B^ be any properties. The term 
“logical product” or simply “product” of these properties will be used to 
designate the property, say B", which consists of possessing each and 
every one of the properties Bi , Bj , • • • , B^ . The product of properties 
is denoted by the same symbol as the product of numbers. 

= B, • • • • S, = n B< . 

» -1 

If Bi and stand respectively for “black” and “small,” then B” = BiBj 
means the property of being “black and small.” 

From these three definitions it follows easily that, whatever be the 
properties Bi , B*, B,, , 

f Bi + Bj + B 3 = (B, + Bj) + B 3 , 

( 2 - 6 . 1 ) \ 

^Bi Bi Bi — (BiB^Bi, 

(B, + Bi=Bi + B,, 

( 2 - 6 - 2 ) \ 

[B,Bi = BiB^, 

(2-6-3) Bi(Bi + Bi) = B.B 3 + B 3 B 3 . 

These identities mean that logical summation and multiplication of prop¬ 
erties are “associative,” “commutative,” and “distributive.” 

Another important identity relating to operations on properties is 

3 (2-6-4) B 1 B 3 B 3 = (Bi + B 2 + Bi). 

To convince oneself of the truth of this last equation, it is sufficient to 



46 FUNDAMENTAL THEOREMS [2’6'1] 

realize that to assert that an object possesses Bi and Bt and B 3 [this is 
the left-hand side of (2-6-4)] means, in effect, to deny that this object 
fails to possess at least one of these properties [this is the right-hand side 
of (2- 6 -4)]. 

Identities (2-6-1), ( 2 - 6 * 2 ), (2"6-3), and (2*6-4) are written in relation 
to only two or three properties. The student will have no difficulty in 
finding that similar identities hold for any number of properties. For 
example, 

(2’ 6 -5) (Bi B^iBi + B^ = B 1 B 3 BiBt B 2 B 3 -f- ^ 2 ^ 4 . 

The above rules of addition and multiplication of properties are analogous 
to those in arithmetic. However, the analogy is not complete. For example, 
whatever the property B, we have the identities 

B + B = 5 
BB = B, 

which are not generally true in arithmetic. 

Impossible property. Let (S be a set of objects A and let B be a property. 
If none of the objects A possesses the property B, then B is called a prop¬ 
erty impossible in S. On occasion it will be convenient to use the symbol 
(Ofl) to denote a property impossible in S. Obviously, whatever the prop¬ 
erty Bi and whatever the set S of objects A, the product 

= (0«) 


is an impossible property. The probability of an impossible property is zero. 

Sure property. If all the objects A of the set S possess the property B, 
then this property is called a sure property in S. On occasion it will be 
convenient to use the symbol (Is) to denote a property which is sure in 
S. Obviously, whatever the property Bi , the logical sum Bi -t- 5i is a 
sure property in S, _ 

B3 + B3= (Is). 

The probability of a sure property is one. 

Properties equivalent in S. If every object i4 of a set S which possesses 
a property Bi necessarily possesses another property Ba , and vice versa, 
then Bi and Bj are called equivalent in S. To illustrate this point, consider 
the set S, composed of three categories of balls of which ni are “black” 
and “small,” Wa are “white” and “large,” and na are “red” and “large,” 
with ni > 0, na > 0, and n* > 0. Then the properties “black” and “small” 
are equivalent in S, but “white” and “large” are not. 

If Bi and Ba are equivalent in S, it will be convenient to write 

Bi — Ba • 

■' a 



[2*6*1] OPERATIONS ON PROPERTIES 47 

Obviously, if Bi and Ba are equivalent in the F.P.S., then their probabilities 
are equal. 

Properties exclusive in S. Let Bi and Ba be any two properties and 
B 1 B 3 their logical product. If B 1 B 2 is impossible in a set S of objects A, 
then Bi and Ba are called exclusive in Also, we use the expression 
‘‘properties Bi , Ba , • * • , B„ are exclusive in to mean that every two 
of these properties are exclusive in S, 

Some operations on sure and impossible properties. Let B stand for any 
property and let S denote an arbitrary set of objects A. Then it is easy 
to see that 

5.(0,) = ( 0 .), 

B + (1.) = (Is), 

(2-6-6) 1b + (Os) ^ B, 

8 

B.(l,) ^B. 

The last of these relations is particularly important in certain problems 
in probability. It is frequently used when the probability of B is difficult 
to compute by the direct method and when it is possible to find some r 
properties Bi , Ba , • • • , B^ such that their logical sum is a property sure 
in the adopted F.P.S., say Si . Then Bi + Ba + * • • + B^ = (I 5 J and 
formula ( 2 - 6 - 6 ) gives 

(2-6-7) B = B(Bi + Ba + • • • + B,) = BiB + BaB + • • • + B,B. 

Si 

On occasion, this presentation simplifies the computation of the prob¬ 
ability of B. This is especially true when the properties Bi , , • • • , B, 

are exclusive. In particular, the following formula is frequently useful 

P{B,} = P\B,B,\ + PlB^B,]. 

Remark. Rather than think in terms of properties taken in the abstract, 
so to speak, some readers may find it easier to think in terms of sets of 
objects possessing these properties. In fact, once a property B is well 
defined, it is easy to visualize all the objects which possess this property. 
Thus the definition of the property B may be interpreted as the definition 
of a set. T hinking in these terms, the definitions of logical sum and of 
logical product of two or more properties is transferred to the definition 
of the sum of two or more sets (frequently called union) and to the product 
of the sets (usually called inter section). Thus the term sum of two sets 
Si and Si or union of S, and Si is used to denote the set of those objects 
which belong to at least one of the sets Si and S*. 



48 


FUNDAMENTAL THEOREMS 


[ 2 - 6 - 1 ] 

It is suggested that the reader reword the contents of the present sub¬ 
section in terms of sets. For example, the reader will benefit by formulating 
the definition of the intersection of two (or more) sets Si and S 2 using 
the following: if the set Si is composed of all objects possessing the property 
Bi and the set S 2 of all objects possessing B 2 , then the intersection of 
Si and S 2 is composed of all objects possessing the logical product B 1 B 2 • 

PROBLEMS AND EXERCISES 

1. To assert that an object is ‘'made of wood^' = Bi or “made of 
metar' = B 2 or “made of glass” = B^ means to deny: that this object 
“is not made of wood” = Bi ; that it “is not made of metal” = B 2 ; and 
that it “is not made of glass” = B 3 . (a) Write the equivalent of this 
statement using symbols of logical operations on properties, (b) Deduce 
the formula so obtained from (2-6*4). 

2. Of the objects A forming a set S some are “wooden” = TT, some 
others are of “iron” = /, and all the remaining ones are “of glass” = (?. 
Certain of the objects A are “black” = B, (a) With this definition of 
jB, W, /, and G, consider formula (2-6-7) and write down the exact 
meaning of the assertion that an object A in S is black. You should start 
as follows: “to assert that A is 'black' means to assert that • • • ” (b) To 

n 

apply formula (2-6-7) it is necessary that ^ be a property sure in S. 

i-i 

Is this condition satisfied in the present problem? 

3. Let Bi denote the property “heads on the t*'" toss of a coin,” 
for i = 1, 2,3. Consider three tosses of a coin and write in terms of Bt, Bt, 
Ba the following properties: (a) Heads in all three tosses; (b) Heads in at 
least one toss; (c) Tails in at least one toss. 

4. Draw a system of rectangular coordinates Ox and Oy and consider the 
set So of all points which lie within the circle centered at the origin (0,0) 
with radius equal to 2. Limit your consideration to the points belonging to 
So. Draw three other circles, each of radius equal to unity, with centers at 
(0,0), (0,1) and (1,0), respectively. We shall roy that a point belonging to 
So possesses the property (7,- if it lies within the t”* circle. Draw a separate 
diagram representing the set of those points belonging to So which possess 
the property Bi = CiC,. Make a similar diagram for each of the properties 
Bi defined as follows: 

Sa = Cl + C* ; Ba = Cl + Co + C, ; J?4 = CiCaC* ; Bo = Cj + CiC, ; 

B. = (Cl + Ca)(Ci + Ca) ; Br = Cl + Ca ; Ba = CiCa ; Bo = (W. 

5. If the properties C and D are exclusive, are BC and BD necessarily 
exclusive? Explain. 

6. 1 am going to throw two ordinary dice once. One die is red and the 
other green. If the red die shows sue, we shall say that the throw of the pair 



ADDITION THEOREM 


49 


[ 2 -& 2 ] 

of dice has the property Sr , Otherwise the pair of throws has the property 
Sr . Similarly if the green die shows six, we shall say that the throw of the 
pair of dice has the property So . Using the symbols of operations on 
properties, write down formulae for the following properties of the throw 
of two dice. Bi = (at least one die shows ‘^six^Oj ^2 = (double six); 
Ba = (the two dice show a total of 12 points). 

2*6*2. Addition theorem on probabilities. 

Theorem 2*3. Whatever the properties Bi and Ba and whatever the 
F.P.S., the probability of the logical sum B^ + Bg equal to the sum of the 
probabilities of Bi and of B^ minus the probability of their product B1B2 , 

(2*6*8) P{B, + B2} = P{B,\ + PiBa} - P{B,B 2 ]. 

Whatever the F.P.S. for which the properties Bi and Bg are defined, the 
elements of the F.P.S. can be classified into four categories according to 
whether or not they possess both properties Bi and Bg , only one of them, 
or none. The number of elements in each category will be denoted re¬ 
spectively by ^(BiBg), n(BiB2), n^B^Bi) and n{BiB^, Let AT > 0 be the 
total of these numbers. Using these symbols, the four probabilities in 
(2*6*8) are written as follows: 

P{Bi + Bg} = ^(-^^1-^2) + ^(BiBg) + n(BiBg) 

Pio ) n(BtB2) + ^(BiBg) 

^ “ N ^ 

\ ^ ^(^1-^2) + ^(BiBg) 

r\li 2 ] — ^ 7 


B{B,Bg} 


^(BiBg) 

N ’ 


and it is seen that they satisfy the identity (2*6*8) Q.jE D. 

If the properties Bi and Bg are exclusive so that PfBiBg} = 0, then 
the addition theorem takes a very simple form: the probability of the 
logical sum of two exclusive properties equals the sum of their respective 
probabilities. 

The addition theorem was stated for the sum of two properties only. 
When the probability of the logical sum of a number r > 2 of properties 
is required, the answer is obtained by several successive applications of 
the addition theorem as stated above. 

Let Bi , Bg , • * * , Br_i, Br be any properties. Clearly their logical sum 
• can be written down as the sum of only two properties: 


B,+B2+---+ Br^i + B, = (Bi + Bg + * * * + Bril) + Br . 



[2-6-2] 


■50 FUNDAMENTAL THEOREMS 

Applying the addition theorem just proved, we have 


(2-6-9) 


= -pj 2 P.} + P{Br] - p{b, g J5,| 

= p{B,} + p{ 2 p.j - p{ 2 b * b ] 


and it is seen that the problem of computing the probability of the logical 
sum of r properties is reduced to that corresponding to r — 1 properties. 

Applying (2*6*8) to the probabilities p| ^ and p| ^ and 

noticing that (5,Br-i)(Pr-iPr) = , we have 

p{ gpj = + p{ 2 p.} - p{ 2 p‘P'->}’ 

2p<pj = p{B,-iP.i + p{ 2PiPr} - p{ 

Substituting these results into (2-6*9) and sorting the terms, we obtain 
P{ gP.) = P{Br-t] + P{Br] - P{Br.rBr) + p| g B.j 


( 2 - 6 - 10 ) 


- p{ i: - p{ 2 B,B^ + p| g j. 


In this last formula the problem is reduced to computing probabilities of 
logical sums of r — 2 properties, etc. In particular, if r = 3, then formula 
(2 • 6 • 10) gives 

P{Pi +B,+ Pa} = P{Pi} + P{P2} + P{Bs} - PlPaPa) 


- P{BiPa} P{B,B,} + P[B,B,B,}. 

The general formula for the probability of the logical sum of r arbitrary 
properties will be deduced later after the student is introduced to methods 
leading to the desired result in a brief and elegant way. At the present 
moment, we will notice that, if the properties Pi, Pa, • • • , Pr are mutually 
exclusive, then the probability of their sum is equal to the sum of their 
respective probabilities, 

2 p{b,}. 

I <-i j 1-1 


( 2 - 6 - 11 ) 



MULTIPLICATION THEOREM 


[2-6-4] 


51 


The student will have no difficulty in proving this result either directly 
or by successive applications of formula (2*6*8). 


2*6*3. Probability of a sure property. 

Corollary 2*1. If the logical sum of r exclusive properties Bi , 32 , • • • , Br 
is a sure property in the F,P,S., then the probabilities of these properties must 
add up to unity^ 

( 2 * 6 * 12 ) = 1 . 

r 

This result follows from formula (2-6-11). In fact, if ^ B, is a sure 

♦ -1 

property, then its probability is equal to 1. 

Formula (2*6*12) implies that, whatever the property S, 

P[B\ = 1 - P{B]. 

In fact, the two contrasting properties B and B are exclusive and their 
sum B + B — (1.). 


2*6*4. Multiplication theorem on probabilities. 

Theorem 2*4. Whatever the properties Bi and B 2 and whatever the 
F.P.S,, if Bi is not impossible in this F.P.S., then the probability of the 
logical product BxB 2 is equal to the absolute probability of Bi multiplied by 
the relative probability of B 2 given Bi , 

(2*6*13) P{BxB2} = P{Bx] P[B2 I Bx]. 

If B 2 is not impossible in the F.P.S. then we have also 

P{BxB2] = P[B2\ 

The proof of this theorem is similar to that of the addition theorem. 
Using the symbols of subsection 2*6*2 and assuming that Bi is not an 
impossible property, so that n{BxB 2 ) + n(P,P 2 ) > 0, we write 

[ P\BA\ . , 


(2-6-14) 


+ n(B.B.) 

l [tSl] — jy > 

P\B 1 B 1 =- ~ . 


,The proof of the theorem is concluded by substituting (2-6 14) into 
(2-613). Q.E.D. Notice that the assumption that B, is not an impossible 
property is needed to insure that the probability P{B 2 1 B,} has meaning. 



52 FUNDAMENTAL THEOREMS [2’6’4J 

The multiplication theorem is easily generalized for an arbitraiy number 
of properties. Let Bi , Bj, • • • , Br stand for some r properties and Bo for 
the definition of the objects forming the fundamental probability set. If 
the product BiBa • • • is not impossible, then 

P{B,B, •••Br\Bo} = P{B, 1 JS„} P[Bo 1 BoB,} ■ • • 

(2-6-15) 

• • • P{B, \BoB,--- • • • P{B. 1 Bo • • • 

or, using the product sign, 

(2-6-16) H n ^ n ' S 

Formula (2-6'16) may be read as follows: the probability of the product 
of r properties B,, Bj, • • • , B, given Bo is equal to the product of r terms, 
each representing the probability of a particular property B, given the 
product of all the preceding properties Bo , B, , • • • B,_i . To achieve 
this convenient formulation, the absolute probability of B, is formally 
treated as if it were a relative probability of B, given Bo . 

To prove formula (2 • 6 • 16) notice that, whatever be i, the logical product 
of BiBa ••• Bi can be written as 

BjBa • •' B,_iB,- = (BiBa • • • Bi^i)Bi. 

Thus, using (2-6-13) we have 

P{BxBa • • • Bi.iBi 1 Bo} 

(2-6-17) 

= P{B.Ba • • • B,_. I Bol P{B, I BoBx • • • B,_i}. 

Writing a column of formulae of the type (2-6-17) with f = 2, 3, • • • , r, 
multiplying them sideways and then canceling equal terms on both sides, 
we obtain (2-6-15). 

Remark 1. Formula (2 • 6 • 15) was proved under the express assumption 
that the product B^B^ • • • B,_, is not impossible in the F.P.S. This as¬ 
sumption is necessary so that all the probabilities in the right hand side 
have meaning. If, however, it is agreed to consider that a product is equal 
to zero whenever one of the factors is zero, even though some of the other 
facto]:s are not defined, then formula (2-6-15) may be written without 
any restriction. 

To see this, notice that, whenever the product B,Bj, • • • B,_i is im¬ 
possible in the F.P.S. so is the product B 1 B 3 • • • B,_,B, , so that the 
probability in the left-hand side of (2-6'16) is equal to zero. Also, consider 
the sequence of products of properties 

(2-6-18) (BoBx), (BoBxB*), • • • , (BoB^B* • • • B*) , • • • , 


(BoBiBj • • • B* • • • B,_i) 



[2*6*4] MULTIPLICATION THEOREM 53 

of which the last is known to be impossible in the F.P.S. Let BqBi • • Bt 
be the first product of the sequence which is impossible in the i'.P.S. so 
that BqBi • • • Bk-i is not impossible. Then the probability 
P{Bk\ BqBi • • • Bje-i } is well defined and necessarily equal to zero. In par¬ 
ticular it may happen that BqBi is impossible. Then the first of the factors 
on the right-hand side of ( 2 * 6 * 15 ) is well defined and equal to zero. It fol¬ 
lows that, whenever the right-hand side of ( 2 * 6 * 15 ) contains factors which 
have no meaning for the reason that some of the products (2 * 6 * 18 ) are 
impossible properties, it must also contain a well-defined factor equal to 
zero and, at the same time, the left-hand side of ( 2 * 6 * 15 ) is equal to zero. 
Because of this circum stan ce, w e^ill write formula ( 2 * 6 * 15 ) whether some 
o^the pro ducts (2 * 6 * 18 ) .are im po ssible or m)E. . 

Remark 2 . The purpose of this remark is to draw the reader^s special 
attention to the fundamental probability sets to which the probabilities 
mentioned in the addition and the multiplication theorems refer. 

The addition theorem is concerned with P{Bx -f- the probability 

of the logical sum of two (or more) properties. The contents of the theorem 
is the formula proved above expressing + B2\A } in terms of probabili¬ 

ties of Bi y of B2, and of B1B2 , all taken with respect to the same fundamental 
probability set A with respect of which we want to know the probability of 
By A- B 2 . 

Similarly, the multiplication theorem is concerned with the method by 
which the probability of the logical product B1B2 may be calculated. This 
probability also must refer to a definite F.P.S., say A'. The theorem asserts 
that in order to compute P {PiPal A'} it is sufficient to compute at most two 
probabilities. The first is P(Bi|A'}, the probability of By with respect to the 
same fundamental probability set A'. If this probability is zero then 
P{Bii?2|A'} is zero also. If P{Pi|A'} 9^ 0 , then, in order to obtain 
P {P1P2IA' j, we must multiply P {Pi | A'} by the probability of P2 computed 
with respect to the set of those elements of A' which possess the property 
P2 . The essential point is that in both theorems the F.P.S.^s to which the 
probabilities refer are strictly determined. 

Although this remark is trivial and must be obvious to every careful 
reader, on occasion the definition of fundamental probability sets is neg¬ 
lected. Combined with loose phraseology in the statement of the problem, 
this neglect leads to all sorts of ^‘paradoxes'’ allegedly connected with the 
theory of probability. Here is an example: 

It is given that the anti-aircraft defenses of two targets Ty and T 2 are 
such that the probability that an enemy bomber will be shot down over Ty 
is py = .6 and that it will be shot down over T2 is p2 = . 7 . We are required 
to compute the probability po that the bomber will be shot down either over 
Ty or over T 2 . 

The reasoning leading to a “paradox” is as follows. The property of 
“being shot down either over Ty or over r2^^ is the logical sum of “being 



54 FUNDAMENTAL THEOREMS [2’6*4] 

shot down over T” and of “being shot down over T, The bomber cannot 
be shot down twice and, therefore, the two properties are exclusive. Thus, 
the addition theorem gives the paradoxical result; 

Po = P{being shot down either over T, or over ^ 2 ) = Pi + Pa = 1.3. 

However, this “paradox” disappears immediately when one starts to 
think about the fundamental probability sets to which the probabilities 
Pi, P 2 , and Po refer. In particular, it must be clear at once that pi and p, 
refer to two different F.P.S.’s, say Ci and (7,. Thus, Ci may be composed 
of missions in which target Ti is the only target or in which Ti is attacked 
as the first of several targets contemplated. Similarly, C 2 may be composed 
of missions in which T 2 is attacked either as a single target or as the first of 
several targets. It is seen that Ci and C 2 are two different sets so that the 
addition theorem does not apply to the probabilities pi and pa . Further¬ 
more, the statement of the problem does not specify the fundamental 
probability set, say Co , to which the probability po is supposed to refer. 
Here several different interpretations are possible. 

One possibility is that po refers to the set C'o coinciding with C, and com¬ 
posed of double missions in which the bomber first attacks Ti and then, if 
it is not shot down over Ti, attacks T 2 . With this interpretation, p* stands 
for the relative probability that the bomber will be shot down over Ta , 
given that it was not shot down over Ti . In order to compute the value, 
say Po , of the probability that the bomber will be shot down over either 
Ti or T 2 , we notice that 

Po = 1 — P{bomber survives both attacks|(7^} = 1 — (1 — pi)(l — pa) 


= Pi + P* - PiPa = .88. 

However, the probability po may conceivably refer to another funda¬ 
mental probability set, say C", composed of single missions either against 
Ti or against Ta in which the pilot is allowed to toss a coin to decide which 
of the two targets to attack. In this case Ci may be the part of C" composed 
of the missions where the coin falls heads and the only target attacked is 
Ti . SimilaTly Ca may coincide with C” — Ct and be composed of the mis¬ 
sions in which the coin falls tails and the target attacked is Ta • With this 
interpretation, the two probabilities pi and pa are relative probabilities of 
being shot down, as follows 

Pi = P(being shot down|heads}, 

Pa “ P{being shot down|tails}. 

The value, say po, of the absolute probability of being shot down over 
either target is then computed by first applying the addition and then the 
multiplication theorem. 



55 


[2’6*6] STOCHASTIC INDEPENDENCE 

po = P(being shot downlC"} 

= P{ (being shot down) (heads + tails) IC"} 

= P{ (heads)(being shot down) + (tails)(being shot down))C"} 

= P{ (heads) (being shot down) I Co'} + P{ (tails) (being shot down)] C"}. 
Applying the multiplication theorem we obtain 

p” = PjheadslCi'lpi + P{ tails! C5'|p2 

and, if the coin is “fair,” 

Po - hpi + \P2 = - 65 . 

It is impossible to overemphasize the importance of a clear definition of 
the F.P.S. which must be given before attempting the solution of any prob¬ 
lem on probability. 

2-6-5. A theorem on relative probabilities. The following theorem is 
useful in many problems. 

Theorem 2-5. If Bi and are exclusive in the F.P.S. considered, and 
if the negation B^ is not impossible, then 

(2.6.19) I 5.1 . ^ . 

This theorem is easily proved by the direct method by which the addition 
and the multiplication theorems were established. It is suggested that the 
reader produce such a proof as a matter of exercise. The following proof 
is based on easy operations on properties. 

Since B2 + ^2 is a sure property it follows that Bi is equivalent to the 
product Bi(B2 + B 2 ) = -81^2 + B 1 B 2 . Since 81 and 82 are exclusive^in 
the F.P.S., 8182 is impossible and, therefore, 81 is equivalent to 8,8a , 
so that _ 

P{8,} = 8(8,82}. 

Applying the multiplication theorem to 8(8,82} we find 
P{B,] = P{B,B,] = P{B.AP{B,\B2}, 
which implies ( 2 - 6 - 19 ) Q.E.D. 

2-6*6. Stochastic independence of properties. 

Definition 2 '6. If the relative ‘probability of B, given B^ is equal to the 
absolute probability of B, 

( 2 - 6 - 20 ) 


P{5i152} = P{B,} 



56 FUNDAMENTAL THEOREMS [2*6*6] 

then Bi is said to be stochastically* independent or, simply, independent of B 2 • 
In order that this definition be applicable, it is necessary that B 2 is not 
an impossible property, so that PiBi] > 0. If not only > 0, but 

also PlBz] > 0 , then the independence of Bi from Pa implies the inde¬ 
pendence of Bi from Bg . In fact, using the notation of Subsection 2 - 6*2 
we can write 

w(PiPa) + n(BA) = J^P{B,} 
and the contents of formula ( 2 - 6 ‘ 20 ) can be written 

n(BiBf) = \n(BiB^ + n{BiB^^P[Bi\, 


Subtracting the last formula from the one preceding it and remembering 
that N — n{BiB 2 ) — n(BiB 2 ) = n{B^ 2 ) + ^(^ 1 ^ 2 ), we have 

(2-6*21) n(PiPa) = [nCPiPa) + nifi,B2)]P{B,]. 


If Pa is not an impossible property, then both sides of (2*6*21) can be 
divided by the expression in square brackets on the right-hand side, giving 


_ njBiB^ _ 

n(PiP2) + ^(PiPa) 


= P{P, |Pa} =P{Pil. 


Thus, apart from the case when Pa is an impossible property, if Pi is 
independent of Pa then it is independent of P 3 . It is left to the student 
to prove that, conversely, if P{Pi | Pa} == P|Pi | Pa}, then Pi is inde¬ 
pendent of Pa in the sense of the original definition ( 2 * 6 * 20 ). 

The excluded case, namely when Pa is an impossible property, is trivial 
because, if Pa is impossible in the F.P.S., then Pa must be a sure property 
and P{Pi I Pa} is, really, the probability of Pi taken with respect to the 
whole fundamental probability set. In other words, it is the relative 
probability of Pi given Pa in name only; actually, it is the absolute proba¬ 
bility of Pi. 

For this reason and to avoid the discussion of trivial cases, when con¬ 
sidering independence of one property Pi from another Pa , it will be 
always assumed that neither Pa nor Pa are impossible properties. 

Let the F.P.S. be composed of elements A and let B 2 A and B 2 A stand 
for the elements which possess properties Pa and Pa , respectively. Then 
independence of Pi from Pa means simply that the property Pi is just as 
frequent among the B 2 A objects as it is among the B 2 A objects, and as 
it is among all the objects A of the F.P.S. If A stands for ‘^table,'^ Pi for 
‘‘black,'' and Pa for “wooden," then independence of Pi from Pa means 

*The word ^^stochastic” seems to have been invented by Jacob Bernoulli (1654-1705) 
one of the founders of the theory of probability. For a time it went out of use but was 
later revived by W. Borktiewicz (1868-1931) in his work Die Iterationen (Berlin: 
Bpringer, 1917). Since then it has been in general use as a synonym of '^probabilistic.’* 



INTUITIVE DIFFICULTIES 


[2-6-8] 


57 


that black tables are as frequent among wooden tables as they are among 
non wooden ones and as they are among all the tables considered. 

The following is another example illustrating the concept of stochastic 
independence. If A stands for a human being, Bi for ‘^death from small¬ 
pox,^’ and B2 for ‘^being vaccinated,” then the independence of Bi from 
B2 would mean an equal frequency of death from smallpox among the 
vaccinated and among the non vaccinated humans. We may hope that 
death from smallpox is stochastically dependent on vaccination! 


2 • 6 • 7. Theorem on independence. 

Theorem 2 * 6 . If Bi and B2 are not impossible properties and if Bi is 
independent of B2 , then B2 is independent of Bi . In this case, we speak of 
mutual independence of Bi and B2 . 

To prove this theorem we use the multiplication theorem and write 

P[B,B2] = P[BAP\B2\BA ^PrnPlB, IB^] . 


This is an identity which holds whenever B^ and B2 are not impossible. 
Since P{Bi} > 0 we may divide both sides of the underlined equality 
by PlBi}, Since Bi is assumed to be independent of B2 , we have 
P{JSi I B2} = P{Bi] and the result of the division is 


Q.E.D. 


P{B2\B,] = P{B2]. 


2-6-8. Intuitive difficulties. The concept of stochastic independence is 
closely related to the vague intuitive idea attached to the word ^'inde¬ 
pendence” as used in everyday life. However, as frequently occurs in 
analogies between mathematical concepts and intuitive notions, the re¬ 
lationship holds only up to a certain point. Therefore, when dealing with 
stochastic independence, it is necessary to keep in mind the exact content 
of the definition and not have it obscured by vague feelings of what 
"independence” should imply. We illustrate below the dangers involved. 

(i) If it is given that in an F.P.S. two properties Bi and B2 are mutually 
independent and a third property Ai is independent of Bi and also inde¬ 
pendent of B2 , then intuition is likely to suggest that must be inde¬ 
pendent of the logical product B1B2 . The example below shows that this 
presumption is incorrect. 

(ii) If it is given that in an F.P.S. any two of the four properties Ai , 
A2 y Bi , and B2 are mutually independent, (i.e., Ai is independent of 
A2 y of Bi y and of B2 ; A2 is independent of B^ and of B2 ; and, finally, 
Bi is independent of ^2), and furthermore that both Ai and A 2 are inde¬ 
pendent of the product B1B2 y then intuition is likely to suggest that the 



58 STOCHASTIC INDEPENDENCE [2-6-8] 

product A 1 A 2 must be independent of the product B 1 B 2 . As will be illus¬ 
trated in the following example, this presumption is also incorrect. 

The following table is divided into four major subdivisions, corre¬ 
sponding to the four possible combinations of properties , A 2 , 

and "A 2 • Each of the four major subdivisions is further subdivided into 
four cells, each cell corresponding to a particular combination of properties 
J5i , Si , J ?2 , and Sj . Altogether, there are sixteen cells, each cell corre¬ 
sponding to a particular combination of Ai , Zi , A 2 , A 2 , Si , Si , S 2 > 
and S 2 . The letters within the cells represent the numbers of elements of 
the fundamental probability set S with the particular combination of 
properties considered. These numbers are specifically indicated in the four 
cells of the first row. It is assumed that a and 6 are positive integer numbers. 


Fundamental Probability Set 5 


■ 


Ai 



Bj 

B 2 


B 2 

B 2 

1 


n(A)A2B]B2) 

n(AiA2BiB2) 

=b 


B, 

n (A)A 2 B] B 2 ) 

-b 

n(A|72BtB2) 

sa 


B, 

b 

a 


B, 

a 

b 











B 2 

^2 


Bj 

B 2 

A, 

Bi 

b 

a 


B, 

a 

b 


B, 

a 

b 


B, 

b 

a 










To illustrate the phenomenon under (i), consider the part /S(Aj) of the 
set S composed of all elements which possess the property At. The number 
of elements in is n(j4a) = 4 (o + h). 

As the student will have no difficulty in verifying, the probabilities 
computed for the set < 8 (^ 2 ) are 

P[A, I A 2 B,} = P{A, I A 2 B 2 } = P{A, 1 A,} = i 

Thus, in the set BCAj) the property A, is independent of Bi and of Ba . 
Moreover 


P{B, I A,B,} = P{B, I A*} = P{B, I A 2 } = I 










[2*6*9] INTUITIVE DIFFICULTIES 59 

SO that Bi and B2 are mutually independent in Yet, the relative 

probability of Ax given A 2 B 1 B 2 is 

P{Ax I A 2 BXB 2 ] = 

which is not necessarily equal to one-half. In fact, if a = 1 and b = 99 , 
then 

P{Ax 1 A 2 BXB 2 } = . 01 . 

Alternatively, if a = 99 and 6 = 1, then 

P{Ax\A2BxB2 ] = . 99 . 

Thus, according to the values of a and 6, the property Ai may be dependent 
on B 1 B 2 in the set /S(A2) in spite of its independence of the two properties 
Bi and B 2 taken separately. 

In order to illustrate the possibility described under (ii), consider the 
whole fundamental probability set S. It is easy to establish that in this set, 

P{Ax] = PIA 2 ] = P{Bx} = P[B2} = I 

irrespective of the values of a and b. Proceeding as above, the student will 
easily verify that in S all the four properties Ai , A 2 , Bi and B 2 are pair¬ 
wise independent and, moreover, that Ai is independent of ^,^2 , A 2 is ♦ 
independent of B 1 B 2 , Bi is independent of A 1 A 2 , and B 2 is independent 
of A 1 A 2 . Yet 

PIA/I.IBAl - 2 ^ 

while 

P{AxA2} = h 

Thus, according to the values of a and 6, the logical product A 1 A 2 need 
not be independent of the logical product B 1 B 2 . This example justifies 
the following definitions. 

2*6*9. Complete independence of properties. 

Definition 2*7. Property Bq is called completely independent of prop^ 
erties J 5 i , B 2 i * * • , Pr if two conditions are satisfied: (i) Bo is independent 
of every property Bi,B 2 , •• ,Br taken separately y and (ii) Bo is dependent 
of the logical product of every group of properties selected out of J5i , S, , 

• • • B 

Referring to the example of Subsection 2-6-8, it will be seen that if 
0 = 6, then Ai is completely independent of , Bi , 



60 COMPLETE INDEPENDENCE [2*6*10] 

Definition 2-8. Properties Bt , B 2 ^ , Br are called mutually com-- 

pUtely independent if every one of them is completely independent of the 
remaining ones. 

Referring to the example of Subsection 2-6*8, it is easy to verify that 
if o = 6, then the four properties Ax , A 2 j Bi , and B 2 are mutually com¬ 
pletely independent. This, however, does not hold if a 6. 

Consider now two sets of properties, set /Si composed of properties 
Ax j A 2 j • • • , Ar, and set S 2 composed of properties Bx , B 2 , • • • , J5r. . 

Definition 2-9. The sets Sx and S 2 , each composed of several prop- 
ertieSy are called completely independent if every property of one set, say 
Si , and also the product of every group of properties selected from the set 
Si are completely independent of properties forming the other set S 2 . 

With reference to the example of Subsection 2-6-8, it will be noted 
that, whatever the values of a and 6, the two sets of properties Ai , Aa 
and Bx , B 2 are completely independent. 

The attention of the student is called to the fact that in Definitions 2 • 7 
and 2-8 the concept of complete independence refers to single properties. 
On the contrary. Definition 2-9 deals with complete independence of sets 
of properties. An important point in Definition 2-9 is that it does not 
require the mutual independence of the properties forming set Sx or the 
mutual independence of the properties forming set S 2 . In fact, two sets 
of properties can be completely independent while the properties forming 
one of them are not mutually independent. 

The concepts of independence just described are essentially very simple 
but a little delicate. To acquire these concepts thoroughly the student is 
advised to work out the examples at the end of this section and to invent 
similar examples. 

2-6-10. Second theorem on independence. 

Theorem 2-7. If a property Bo is completely independent of the prop¬ 
erties Bx , B 2 y • • • , Br , where r ^ 2, then Bq is independent of the logical 
sum jBi + jB 2 “I" • • • “H -Br • 

If Bo is impossible in the given F.P.S., then the theorem is trivial. In 
order to prove the theorem when Bo is not impossible, it will be sufficient 
to show that J5i -f- JB 2 + • • • + Br must be independent of Bq ; i.e., that 

P{Bx+B2 + ••• +Br\Bo] - P{Bx+B2 + ••• +Br]. 

Consider first the simplest case of r = 2. Use the addition theorem and 
write 


P{B» + B, 1 Bo} = P{B. 1 Bo} + P{Bo 1 Bo} - P{BiB, | Bo}. 



61 


[2-6-10] DEFINITION AND THEOREM 

The hypothesis that Bo is completely independent of Bi and implies 
that 

P{B,\Bo} 

P{S, |Bo} =P{B2}, 

P{PA |Po} = P{PiB2}. 

Therefore 

P{B, + B 2 I Po} = P^Pi} + P{P2} - P{PiP2} 


= P{B,+B,]. 

Thus, Bo is independent of Pi + P2 • 

In order to prove the theorem in its full generality, for any number r of 
properties, we will use the method of mathematical induction. 

The method of mathematical induction is applicable in the following 
circumstances. Let n be any integer number and assume that to every 
integer r n, there corresponds a certain proposition T'(r). Suppose that 
it is desired to prove that all these propositions P(n), T{n + 1 ), T(n + 2 ), 
etc. are true. The method of mathematical induction consists, then, in the 
following two steps. To begin with, we prove directly that the first of the 
propositions contemplated, namely T{n), is true. If we are successful in 
this proof then we proceed to the next step. This second step is somewhat 
delicate and consists in proving the following theorem for every m = n, 
n + 1 , n + 2 , • • • : should the proposition T{m) he true, then the next propo¬ 
sition T{m + 1 ) must also be true. The combination of this theorem with 
the previously established fact that T{n) is true implies that r(n + 1 ) 
must be true, then that T{n + 2 ) must be true, etc., in short, that every 
proposition T{r) must be true, for r ^ n. 

In the present case we contemplate an infinity of propositions T(r) 
corresponding to all values of r exceeding unity, r = 2, 3 , • • • , etc. For 
each r the proposition T (r) is as stated in Theorem 2 * 7 : if a property Bo is 
completely independent of some r properties Bi , B2 , • • • , Br , then Bo 
is independent of the logical sum JBi + -^2 + • * • + • The first step 

in the proof by mathematical induction was completed when we proved 
r(2). To complete the second step we must now prove that should it be 
true that the complete independence of Bo from a given number m ^ 2 0 / 
properties , B 2 , • - , Bm implies the independence of Bo from the logical 
sum Bi + B2 + • • • + B„ [this is proposition T{m)], then the complete 
independence of Bo from any m + 1 properties Bi , B 2 , • • • , Bn , J?m+i 
must imply the independence of Bo from the logical sum + B 2 4“ • • • "h 
Bn + Bm-H [this is proposition T{m + 1)]. 

Assume then, for a moment, that whatever be the m properties Bi , 
B2 j • • • , , the complete independence of Bo from these properties 

implies the independence of Bo from the logical sum Bi + B2 + * * • + Bm • 



COMPLETE INDEPENDENCE 


62 


[ 2 - 611 ] 


Assume also that Bo is completely independent of the m + 1 properties 
Bi, Ba, •••!?„, jB„+i . Consider the probability 

p| Efi, 1 = p{ I: I B.} 

(2-6-22) = P{ Z B. 1 B„} + P{B„.. | B„} 


The assumptions made imply that 




P{B„,. |B„j = P{B„.,}. 

As to the last probability in formula (2-6>22), notice that 



Since Bo is completely independent of B, , Bj, • • • , B„ , B„+, , it is also 
completely independent of the m products 

(BiB„+i), {BaB^+x), • • • , (B„B„+j). 

According to the assumption made, this implies that the sum of these 
products is independent of B®, and therefore, 

p{b„.» E b, I Bo} = p|b„., E B,}. 

This, however, shows that 

p| E5< |5o| = p| E5*| 

so that Bo is independent of B, + Bj + • • • + B„ + B«+i . This proves 
the theorem in its full generality. 

It will be noticed that no assumption was made about the dependence 
or independence of the properties Bi , B,, • • • , B,. 


2-6* 11. Multiplication tiieorems for completely independent properties 
and sets of properties. The assumption of complete independence intro¬ 
duces considerable simplification in the multiplication theorems. 

Theorem 2 • 8. If the properties B,, Bj, • • • , B, are mviuoUy completely 
independent, then the probability of their logical prodvct is equal to the product 
of their absolute probabilities. This theorem is evident. 



[2’6"11] multiplication theorem 63 

1HEOKEM 2-9. If Si and S 2 are tim sets of ‘properties, one set completely 
independent of the other, then 'whatever be the group At , , • • • , A„ of 

properties selected from S, , and whatever be the group 81 , 82 , • • • , 8 r of 
properties selected from S 2 , the proboMlity of the logical product of. the two 

r 

sums ^ Ai and 2 equal to the product of their respective absolute 

probabilities, 




To prove this theorem, notice that every property Bj is independent of 

m 

Ai by the second theorem on independence. Since the sets Si and 

t ■“ 1 

S2 are completely independent, every product of properties selected from 
S2 is completely independent from the properties forming the set Si . 
Hence, by the second theorem on independence, every product of proper- 

m 

ties selected from S2 must be independent from ^ Ai . But this means 

m -1 

that ^ is completely independent from the properties forming the 

* “ 1 m 

set S2 . Thus, by the second theorem on independence, is inde- 

r »-l 

pendent from ^ B, , which proves Theorem 2*9. 

This theorem is easily generalized for any number of groups of mutually 
completely independent sets of properties. 


PROBLEMS AND EXERCISES 

1. Prove that if is independent of B2 , then Bi is also independent 
of B2 . 

2. Assume that the properties Bi and B2 are not impossible and answer 
the following questions, (a) If Bi and B2 are exclusive, then must they 
be independent? Why? (b) If Bi and B2 are independent, then must they 
be exclusive? Why? 

3. The objects A of two fundamental probability sets are represented by 
the small squares in Figures 3 and 4, respectively. Different shadings 
mark the objects having properties Bi , B2 j B^ , Compute the absolute and 
relative probabilities of all the properties involved and state which of 
them are independent. 

In particular, investigate the independence of Bi from B2 , B^ , B2B2 
and Bz + JS3 • 

4. Apply the method of shaded squares as in problem 3 and construct 
an example of an F.P.S. in which Bi is independent of B2 , of B3 , and of 
B2B2 • Investigate the independence of Bi from B2 + -63 •' 



64 COMPLETE INDEPENDENCE [ 2 ’ 6 ’ 11 ] 

5 . Construct an example of an F.P.S. in which 5 , is independent of 
B2B3 but not of Bx and B3 taken separately. 

6. Let Bt be independent of B^ and Bg and assume B2B3 is not impossible. 
Prove that if Bt is also independent of Bg + B3, then it is completely inde¬ 
pendent of Bg and Bg . 

7 . Construct an example of two sets A and B of properties At , Ag and 
Bt , Bg , respectively, and an F.P.S. such that (i) At and Ag are de- 



Figure 3 



Figure 4 


pendent, (ii) Bt and Bg are dependent, and (iii) the set A is completely 
independent of the set B. 

8. Use the method of mathematical induction to prove Newton’s bi¬ 
nomial formula. 

9 . Deduce a formula giving the probability of the logical sum of four 
arbitrary properties, P\Bt + Bg Bg Bi], using the method of induc¬ 
tion. 

10. Prove that if every element of the F.P.S. which possesses the property 
B also possesses the property D and if B and D are independent, then D is a 
sure property. 

11 . If you had to construct a mathematical model for situations involv¬ 
ing pairs of properties B and C as defined below, would it be appropriate 
to assume that B and C are stochastically independent? Explain the reasons 
for your opinion. 


(a) B : carrying life insurance, 

C : having tuberculosis. 

(b) B : being listed in the telephone directory, 
C : owning a car. 


(c) B i the husband having blue eyes, 
C ; the wife having blue eyes. 





[2*7] BAG AND BOXES PROBLEM 65 

In each case state the equalities or inequalities between absolute and rela¬ 
tive probabilities which you believe to hold. 

12. Arrange the following four quantities in increasing order of size with 
correct equality or inequality signs between them: 

P{B], P{B + C}, P{BC}, 

13. It is given that P\Bi -f Pzj = i , 

6 

P\B,B,\ = I, 

Pm = I • 

Determine P{P,}, P{P 2 } and PfPilBj}. 

14. In a particular F.P.S. the three properties B, C and D are completely 
independent with probabilities P{B\ = P{C\ — and P{D} = |. 
Consider the properties E, F, G, and H defined below and compute their 
probabilities. You should begin by simplifying the definitions of E, F, G, 
and H as much as possible. Explain each of your steps. 

E = (B + 0{B + C), 

■ F = {B + C + D), 

G = B C D (B + C), 

H = (BD + C)(BB + CD + BCD). 

2 ’ 1 . The problem of bag and boxes 

In this subsection we will illustrate the use of the addition and multi¬ 
plication theorems by solving the problem of Subsection 2 • 4 • 1 in a little 
more general form. Consider one bag and m boxes filled with balls. The 
balls in the bag are marked No. 1, No. 2, • • • , or No. m. The proportion 
of balls bearing one particular number is not necessarily equal to that 
bearing another and we will denote by Pi the probability that a ball 
drawn from the bag will bear No. i for i = 1 , 2, •••, m. Each of the m 
boxes contains black and white balls. The probability that a ball drawn 
from the fth box will be black is denoted by «,• , i = 1, 2, •' • m. An ex¬ 
periment E is performed consisting of drawing a ball from the bag and 
then another ball from the box whose number corresponds to that on 
the ball drawn from the bag. Assuming that the probabilities Pi and 
are kno^vn for f = 1, 2, ••• ,m it is required to compute the probability 
P{b} that the ball drawn in the second draw will be black. 



66 


USE OF THEOREMS 


[2-7] 

As was pointed out in Subsection 2•4*1, the fundamental probability 
set S consists of ‘‘pairs of draws,” one draw from the bag and the other 
from the appropriate box as determined by the first draw. 

All the pairs of draws of the F.P.S. may be classified into 2 m categories 
according to the 2 m combinations of possible outcomes of the two draws. 
These combinations will be denoted by (ib) and (iw) where i stands for 
the number on the ball drawn from the bag and b and w for the color of 
ball drawn from the box. 

Obviously, 

p, = p{i\ 

is the absolute probability of the property i of the pair of draws. Further, 

«.• = P{h I i] 

is the relative probability of b given i. In order to compute P{6}, we notice 
that in the F.P.S. considered, the property h is equivalent to the logical 
sum 


h = (16) + (26) + ••• + (t6) + ••• + (m6) = !■ (ib) 

a i-i 

which, in effect, is the application of formula (2-6'7). 

Therefore P{6} = p| ^ (f6)| and can be computed by the application 

of the addition theorem. For f j the properties (ib) and (jb) are exclusive. 
In fact, if the ball drawn from the bag bears No. i, then at the same time 
it cannot bear another No. J. Thus, the addition theorem gives 

p{b} =p{ tib]^ Zpm, 

and all that remains is the computation of the probability of the logical 
product (ib). By the multiplication theorem 

P{ib\ = P{f} P{6|f} = 

Hence, finally, 

The example just solved is a schematic one. Ordinarily, the use of the 
fundamental theorems on probability is illustrated on similar schematic 
examples or on examples relating to games of chance. While such examples 
have considerable value, it seems preferable that the applications of the 
theorems on probability be illustrated by problems which are closer to 
actual research and are of more general interest. Therefore, the next 



[2-7] 

PROBLEMS AND EXERCISES 67 

section is given to an interesting category of problems on risks and the 
whole of Chapter III, to certain simple problems in genetics. 

PROBLEMS AND EXERCISES 

1. As a result of many observations it was established that 5 per cent of 

all aircraft sent on a particular kind of operation fail to return. This result 
is occasionally interpreted to mean that an aircraft has no chance of sur¬ 
viving 20 sorties of this particular type. Is this interpretation correct? 
Denote by p the probability that an aircraft will survive a single sortie and 
assume that survivals of consecutive sorties are completely independent, 
(a) Compute the probability that the aircraft will not survive n consecutive 
sorties, (b) What is the probability P(n) that the aircraft will survive n 
sorties? Substitute p = .95 and n = 20 and obtain the corresponding value 
of P(20). Partial Answer: 0.358. 

2. A simplified description of the life cycle of an organism A is as follows. 
If the organism survives a specified period of time T, then it splits into two 
new organisms. This process of multiplication is called fission. During the 
period T between two conscuaitive fissions each organism is subject to the 
risk of death and the probability of death is q. Assume that the probability 
of death g = .3 and the period T = 20 minutes are the same for all organ¬ 
isms A. Assume also that the survivals of particular organisms are com¬ 
pletely independent. Compute the probability Pi that there will be eight 
organisms alive at the end of one hour in an experiment beginning with one 
organism. Compute the probability Pa that there will be five. Describe in 
detail the F.P.S. to which the probability P, refers. 

3. In an experiment with an insecticide on mosquitoes, it is found that 

80 per cent are killed in the initial application, but that those which survive 
develop a resistance. The percentage of the survivors killed in any later 
application is half that of the immediately preceding application; i.e., 40 
per cent of the survivors of the first application would succumb to the sec¬ 
ond application while only 20 per cent of the survivors of two applications 
would be killed by a third, etc. Find the probability (a) that a mosquito 
will survive five applications; (b) that it will survive five, given that it has 
survived the first two. Answer: (a) 0.082, (b) 0.G84. 

4. My telephone number is 3-4629. All car licenses in my State are five 
digit numbers not beginning with zero (thus, from 10000 to 99999) and on 
registering my car I shall be allowed to select my car license at random. 
What is the probability P, that my license number will be the same as my 
telephone number? My friend is about to receive his new telephone number 
(also selected at random) and his car license number. What is the prob¬ 
ability P 2 that the two will coincide? Describe in detail the fundamental 
probability sets to which the probabilities P 1 and P 2 refer. 

Partial Answer:'Pi = 0.000011. 



68 USE OF THEOREMS [2’7] 

5. Professor X always fails exactly five of his students in Astrology 99, 
namely the five weakest in the course. This year there are 10 students in 
this course, (a) An assistant has taken 5 of the students^ names out of a hat. 
What is the probability P 2 that all 5 persons whose names were drawn will 
fail? (b) An assistant is going to pick 5 of the students^ names out of a hat. 
What is the probability Pf that among the names drawn there will be the 
name of a given person? What is the probability P? that all 5 persons whose 
names will be drawn will fail? Compute this probability both by the direct 
method and by using the multiplication theorem. Describe in detail the 
fundamental probability sets to which the probabilities with asterisks and 
those without refer. 

6 . A dealer receives ten new radios. He will test them at random until he 
finds one which is working satisfactorily. Let X denote the number of ra¬ 
dios tested. What are the possible values of X if 4 of the 10 radios are 
defective? What are the corresponding probabilities? 

Partial Answer: 0.600, 0.267, 0 . 100 , 0.029, 0.005. 

7. A bombing plane will accomplish its mission only if the navigator 

finds the target and the bombardier hits the target when it is found. Navi¬ 
gators n and N have probabilities .9 and . 8 , respectively, of finding a certain 
target. Bombardiers h and B have probabilities .7 and . 6 , respectively, of 
hitting the target when found. In making up navigator-bombardier teams 
for the two airplanes which will both attack the target but operate inde¬ 
pendently, n may be paired with either b or B, The target will be destroyed 
if either airplane is successful. Which pairing gives the greater probability 
of destroying the target and by how much? Partial Answer: 0.01. 

8 . Consider a matching problem of the following kind. A student is given 
a column of 10 dates and a column of 10 events and is asked to match the 
correct date to each event. He is not allowed to use any item more than 
once. Consider the case where the student knows how to match four of the 
items but is very doubtful about the remaining six. He decides to match 
these six at random. Find the probabilities that he will match correctly 
(a) all the items, (b) at least seven of the items, (c) at least five. 

1 7 91 

Aruswerx (a) ^ , (b) ^ , ( 0 ) 144 - 

9. Consider the following paradox. The records of two weather forecasters 
show that forecaster A is right nine times out of ten and forecaster B is 
right eight times out of ten. For May 8 th, A’s forecast is i 2 = “rain” and 
B’b forecast is S = “dry weather”. Since ^ is a sure property, its 
probability is equal to unity. Thus P{R + J?} = 1 . On the other hand, 
R and •'R are exclusive. Therefore, by virtue of the addition theorem, 

But the probability of rain is the probability 



[2’8*1] problems AND EXERCISES 69 

that A ’s forecast is right, thus P{R} = . 9 . Similarly, P{X} = . 8 . Combining 
the results obtained we have 


1 = P{R + R} = P{R} + PIR} == .8 + .9 = 1.7. 

What is the resolution of this paradox? 

10. In order to verify the contention of the existence of extrasensory 
perception the following experiment is sometimes performed. Eight cards, 
four red and four black, are shuffled and then each is looked at successively 
by the experimenter. In another room the subject of study attempts to 
guess whether the card looked at by the experimenter is red or black. He 
is required to say “black” four times and “red” four times. If the subject 
of the study has no extrasensory perception, then his calling of the colors 
“red” and “black” eight times in succession is comparable to arranging in a 
random order eight objects of which four are red and the other four are 
black. On this assumption, what is the probability that the subject will 
“guess” correctly the colors of six out of the eight cards? 


Ansu>er: 773 
00 


11 . A psychologist wishes to establish whether or not the removal of a 
certain section of a rat’s brain destroys the rat’s memory. He begins by 
teaching a group of 6 rats that only one out of 5 paths in a maze leads to food. 
After all the rats learn to identify the right path without fail, the relevant 
sections of their brains are removed. When the wound is healed, the rats 
are again put in the same maze. It may be considered that if the rat’s 
memory is destroyed then the probability p of a rat finding the right path 
is equal to . 2 . Let X denote the number of rats which will select the right 
path. Obviously X may have values 0 , 1 , 2 , 3, 4, 5, and 6 . 

(a) Assume that the operated rats have no memory at all and compute 
the probability P{X = for /c = 0, 1 , 2 , 3, 4, 5, 6 . 

(b) The psychologist decides to consider that the operated rats still have 
some memory left if the value assumed by X is at least equal to 3. Compute 
the performance characteristic of this rule of inductive behavior for 
p = .2, .4, . 6 , . 8 , and 1.0. 


2’8. Evaluation of competing risks 

2 • 8 • 1. Notion of competing risks. The present Section is intended to serve 
a dual purpose: to provide interesting material for the application of 
theorems on probability and to introduce the reader to problems of applica- 
cation of probability with their specific difficulties. These difficulties are of 
two different kinds. The first difficulty is that no practical problem is 
directly concerned with mathematical concepts, such as fundamental 
probability sets, which are at the basis of the theory of probability. There- 



70 EVALUATION OF COMPETING RISKS [2’8‘1] 

fore, the practical problem must be translated into probabilistic terms 
before its solution is attempted by means of probability theory. In fapct, 
this translation amounts to building up a mathematical model of the prac¬ 
tical problem. Ordinarily there are many different ways in which a mathe¬ 
matical model can be built so that there is a question of which to choose. 
In certain cases, the adequacy of the particular model can be tested empiri¬ 
cally. In other cases, tests of this kind are very difficult and we must rely 
upon our intuitive feeling that the machinery postulated in the model 
corresponds satisfactorily to the phenomena studied. Naturally, the solu¬ 
tion of the problem based on a given model applies to the model itself and 
not necessarily to the phenomena for which it was intended. The degree of 
correspondence between the mathematical solution and the phenomena 
depends on the adequacy of the model. 

The second kind of difficulty in treating practical problems is that most 
observable phenomena are rather complicated. Thus, if we analyze them 
properly and build up a mathematical model which appears adequate, it 
frequently happens that this model itself is so complicated that we experi¬ 
ence purely mathematical difficulties in obtaining the desired solution. In 
many cases of this kind we are forced to admit defeat and to revise the 
model sacrificing its adequacy in order to gain simplicity. 

The practical problem chosen to illustrate the above circumstances is 
the problem of the evaluation of competing risks. This problem occurs in 
many fields of biological, medical, and physical research and has a number 
of different forms. We shall illustrate it on a simple example. However, the 
discussion of this example will indicate how to deal with more general 
problems. 

In judging the effectiveness of a specified treatment of a recurrent disease, 
such as cancer or tuberculosis, it is important to know the following: (i) 
how frequently an apparent recovery is intermpted, within a specified 
time T, by a relapse of the same illness and (ii) how frequently the relapse 
ends in death during the time T following the relapse. 

At first sight the solution of these two problems may seem easy. For 
example, in order to answer the first question, one might suggest isolating a 
large homogeneous group of, say N = 10,000, patients who underwent the 
specified treatment and apparently recovered. Then one would count the 
number of those who had a relapse within time T following recovery. 
The relative frequency of relapse, say Qi = nJN, could be used to charac¬ 
terize the treatment. If two alternative treatments, A and B, are suggested 
then that treatment for which is smaller could be considered the better 
treatment. 

Unfortunately, there are considerable practical difficulties in applying 
the suggestion just described. One such difficulty is that it is impossible to 
isolate a substantial group of persons who have recovered from an illness 
and are returning to normal life. After a year or so these persons will be 



[2*8*1] NOTION OF COMPETING RISKS 71 

dispersed, so that it is difficult to locate all of them in order to ascertain 
whether they are alive or dead, well or relapsed. The second difficulty 
(which in reality is of the same kind as the first) is particularly apparent 
when we consider the problem of comparing two treatments A and B 
applied to two different groups of persons Gi and Gg living in different 
conditions. For example, imagine that the group Gi lives in excellent con¬ 
ditions so that during the year following their apparent recovery from the 
original illness only a few persons of this group die from causes not con¬ 
nected with the illness studied. Assume further that group G 2 is predomi¬ 
nantly composed of miners who are exposed to daily risks of all sorts of 
disasters and to the various illnesses connected with the conditions of their 
work. It is obvious that at the end of the first month of the period of obser¬ 
vation T, the group Gi may lose no members at all and, thus, during the 
second month, all 10,000 of them will be exposed to the risk of relapse into 
the illness studied. On the other hand, during the first month of observation, 
causes not connected with this illness may kill a substantial number of the 
members of group Gg who, therefore, will be prevented from suffering a 
relapse in the second and subsequent months of the period of observation. 
Thus, it is clear that the risk of death from other causes ‘‘competes^' with 
the risk of relapse. As a result, if the treatments A and B are of exactly 
the same effectiveness but group Gi is exposed to a mild risk of death from 
other causes while group Gj suffers an intense risk, then Qi , the relative 
fre(|uency of relapse, computed for group Gi is likely to be larger than that 
for group G 2 . It follows that the quotient Qi does not characterize the risk 
of relapse alone but depends also on the competing risk of death. Further¬ 
more, the reader will have no difficulty in establishing that the quotient Qi 
also depends on the intensity of the ‘‘risk^' of the outgoing patient being 
“lost'^ within the time T so that at the conclusion of this period it is im¬ 
possible to locate him and to ascertain his fate. 

For these reasons quotients of the type Qi are not appropriate charac¬ 
teristics of the corresponding risks and are called crude rates of these risks. 
Having established this point we must now see how we can define a number 
which would be an acceptable characterization of the risk^s intensity. The 
concept which suggests itself is that of the long run relative frequency of 
the given risk if it were observed in artificial conditions where all other risks 
wore eliminated. This long run relative frequency may be called the net 
rate of the risk within time 2’ and denoted by P . 

The reader will realize that, although there exist examples of laboratory 
experiments, perhaps in physics, where the net rate of a given risk may be 
observed directly, in most biological phenomena the net rate of a given 
risk is necessarily an abstraction. In particular, the net rate of relapse into 
a given illness is hardly observable because it is impossible to eliminate the 
* risk of death. It is true that, by shortening the time of observation, we may 
decrease the risk of death. However, the risk of relapse will decrease also 



72 EVALUATION OF COMPETING RISKS [2-8-2] 

and, on closer examination, the problem is essentially the same, whether 
the time T is long or short. 

2*8-2. A mathematical model. In introducing the concepts of crude and 
net rates of a given risk, we have made a step towards building up a mathe¬ 
matical model of the phenomenon of competing risks. However, it is clear 
that this step is only preliminary and, by itself, is not adequate for the 
formulation of a mathematical problem. Further steps in the same direction 
may be of many kinds and of varying complexity. We shall limit our study 
to a simple scheme which requires only elementary mathematical tools. 
In order to make this scheme easy to apply to a variety of problems and, 
at the same time, in order to avoid an excessively abstract presentation, 
each of the steps leading to the mathematical model will be discussed twice. 
First we shall speak in general terms of arbitrary risks. Next we shall repeat 
the description of the steps with particular reference to an outgoing patient 
just recovered from a disease D and exposed to the risk of death from causes 
not related to D, to the risk of a relapse and, to the risk of death following 
the relapse of D. 

Before proceeding further it will be convenient to establish a flexible 
terminology applicable to a variety of problems. In everyday life the word 
‘*risk^' is used to describe a possible future event which is undesirable. Thus 
we speak of the risk of death, of the risk of fire, and so forth. It will be con¬ 
venient to abandon the limitation of undesirability and to use the word risk 
in relation to any possible future event, desirable or not. Thus, in parallel 
with the risk of death we shall discuss the ‘‘risk^^ of survival, etc. 

In this dual discussion we shall introduce certain concepts and symbols. 
In order to make a distinction between the discussion of a symbol with 
reference to the general problem of risks and with reference to the particular 
problem of an outgoing patient, the symbols relating to the former will be 
marked with asterisks. Thus, for example the s 3 anbols S* and Sq will denote 
the same concept. However St will denote this concept in relation to the 
general problem and Sq in relation to the problem of the outgoing patient. 

Whatever the problem on risks, it always concerns the fate of an indi¬ 
vidual or, in more complicated cases which are not treated in this section, 
of a group of individuals. The word ^flndividuar' may refer to a person, an 
animal, an atom, an aircraft, etc. In each case, it is contemplated that at 
different moments of the time T covered by the study, the individual con¬ 
cerned may be in one of several exclusive states. For example, at any given 
moment after apparent recovery from a disease D, an outgoing patient 
(= individual) may be in state iSo defined as being alive and without 
relapse into disease Z); alternatively, he may be in state Si , defined as 
being dead from causes not connected with disease Z); or, another possibility 
at the moment considered, the individual may be in state S 2 = suffering & 
relapse of Z); or he may be in S^ = dead as a result of the previous relapse. 



A MATHEMATICAL MODEL 


73 


[2-8-2] 


In addition to these four exclusive states we may contemplate many 
more. For example, we have mentioned above the possibility of state 
S 4 , = being lost from observation. In practical situations, it is easy to dis¬ 
tinguish a great number of possible states in which the individual studied 
may be at any given moment. However, theoretical studies of problems 
involving many different states are complicated. For this reason we are 
forced to consider mathematical models involving only a few possible 
states. In order to illustrate this point we shall mention that by limiting 
our considerations to Sq y Si, S 2 , S 3 y and as defined above we postulate 
that during time T it is impossible for the outgoing patient to recover from 
a relapse; he either continues in the relapse or dies from it. A postulate 
of this kind is an example of sacrificing the adequacty of the model in order 
to gain simplicity. With regard to certain diseases of short duration and 
frequent recovery the loss in adequacy due to this postulate would be too 
great to be admissible. However, in problems concerning diseases of long 
duration, such as cancer or tuberculosis, the discrepancy between reality 
and the postulate that in time T recovery from relapse is impossible may 
1)0 trivial. 

The essential point in the above discussion is that the first step in study¬ 
ing a problem of risks consists in postulating a certain number of possible 
mutually exclusive states, say So y Sf y S* y which the individual 
may find himself at any given moment of the period T covered by the study. 
The symbol will be used to denote that particular state in which the 
individual is at the beginning of the period of observation. Thus, in the 
problem of an outgoing patient. So means the state of being alive and free 
from relapse of the disease D. 

The concept of exclusive possible states will be used to give a precise 
meaning to the word risk. In postulating the existence of -h 1 different 
states in which an individual may find himself during time T, we admit the 
possibility of transfers from one state to another. In general we shall find 
it convenient to consider that an individual faces n + 1 possibilities at any 
given moment: he may remain in the same state or be transferred into any 
one of the n other states. All these n + 1 possibilities will be described as 
risks. If, at a given moment, the individual is in state S? , then Rfi will 
denote the risk of his remaining in this state and JS*, the risk of being trans¬ 


ferred from state Sf to state Sf . r xu • 1 

According to the circumstances of a given problem, some of the risks 

may be impossible. Thus, for example, transfer from the “state of bemg 
dead” into the “state of being alive” is impossible. In discussing risks it is 
occasionaUy convenient to describe some risks as direct and some others as 
contingent. If, in the circumstances of a given problem, an individual can he 
transferred from state Sf directly into state Sf , without passing 
any other state, then the risk B?, is a direct risk. On the other hand, if the 
transfer from Sf into St is possible only by transferring the individual first 



74 EVALUATIOIV OF COMPETING RISKS [2*8*2] 

from Sf into, say, Sf and then from Sf into Sf , then the risk 72?*, is called 
contingent, namely, contingent on R* . 

With reference to the problem of the risks of an outgoing patient, the 
symbol Roo denotes the risk that a patient who is in state So will remain in 
this state. Similarly, R 23 stands for the risk of transfer from state S 2 
(relapse) into state S 3 (death after relapse). The symbol R 32 means the 
risk of transfer from state S 3 into state S 2 . However, with the particular 
meaning of these states, R 32 is impossible. The risks i?oi and R 02 (the risk 
of death from other causes and the risk of relapse) threaten the outgoing 
patient immediately after his apparent recovery. Therefore, they are direct 
risks. Risk R 23 also is direct since an individual may be transferred directly 
from S 2 to S 3 . On the other hand, risk R 03 (of death from a relapse of a 
person who is not suffering a relapse) is contingent on the direct risk Iio 2 
because the patient may die from a relapse only after first suffering a relapse. 
In other words, a transfer from state So into state S 3 is possible only by 
transferring first from <So to S 2 and then from S 2 to S 3 . 

Using the concepts of possible and impossible risks and of direct and 
contingent risks, the adopted model of the problem of risks of an outgoing 
patient may be described as involving four possible states, three possible 
direct risks of transfer, namely 72oi, R 02 , and R 23 , and one contingent risk 
of transfer R 03 • 

The most essential point of the mathematical model of risks studied in 
this section is the splitting of the time T of observation into a large number 
of small intervals of time of equal length r. Whatever the possible and mu¬ 
tually exclusive states contemplated, say S? , S* , , • • • , S? , and what¬ 

ever the risks Rfi , it will be postulated that the time of observation T can 
be split into M intervals of length r = , say 

'Ti , T2 , • • * } rig y rig+i , * * * j Tjif y 

so short that it is impossible for the individual concerned to be affected by 
more than one risk within any one interval. This implies that, if at the 
beginning of the interval r* the individual happens to be in a particular 
state Sf then he spends r* in one of the not more than n + 1 mutually 
exclusive waj^s: either he stays in Sf for the whole time interval or he is 
transferred into one of the other states, say Sf . In the latter case, the begin¬ 
ning of T*+i will find the individual in state Sf , 

The time intervals r* will be called elements of time. According to cir¬ 
cumstances, the time interval may be thought of as a single day, an hour, a 
second, etc. Assume tentatively that r = period of 24 hours, say from noon 
to noon and consider the problem of risks facing an outgoing patient. If on 
January 1 st, 1950 at 12 noon the patient is alive and is not suffering a 
relapse, then the postulate of elements of time means that during the next 
24 hours the following three things can happen to him: (a) he can continue 
to live and be free from relapse, (b) he can die from causes other than dis- 



[2-8*2] A MATHEMATICAL MODEL 75 

ease D, and (c) he can suffer a relapse of disease D. However, it is postu¬ 
lated that it is impossible for him to suffer a relapse and then die before 
noon of January 2 nd, 1950, The possibility of death after relapse is admit¬ 
ted only after 12 noon, January 2 nd. 

The reader will see that the postulate of time element is a considerable 
simplification. The longer the time element, the more artificial this simpli¬ 
fication becomes. In the example just discussed we excluded the possibility 
of a relapse at 12:01 p.m. on January 1st and of death at 11:59 a.m. on 
January 2 nd. On the other hand, we admit the possibility of a relapse at 
11:59 a.m. on January 2nd and of death, only two minutes later, at 12:01 
p.m. of the same day. This is still another sacrifice in the adequacy of the 
mathematical model in order to gain simplicity. However, the reader is 
likely to feel intuitively that if the postulated time element is very shorty 
that is, if M is very large, this simplification is less objectionable. 

Thus far, our model does not involve any clement of randomness. Ran¬ 
domness is introduced in the choice among the possible ways in which the 
individual spends a given time element. Assume that at the beginning of 
the interval n the individual is in the state S* . We postulate that in order 
to determine how the time n will be spent: 

(i) A ball is drawn from a bag which we shall denote as Bf(k). 

The subscript i attached to Bf{k) refers to the state Sf and the letter k 
to the time element ta- . The bag Bf{k) contains numbered balls. 

(ii) The individual suffers the risk where j is the number on the 
ball drawn from Bf(k), 

If it happens that J = i, then the individual remains in Sf for the duration 
of Tjfc. Otherwise, if j 5 ^ i, the time element Jk is spent in transferring him 
from state Sf to Sf . 

It follows that, in the model studied, it is necessary to consider several 
sequences of bags, as many sequences as there are different states contem¬ 
plated. The particular bags of any sequence correspond to consecutive 
elements of time. There will bo M bags -B(?(fc), with fc = 1, 2, • • • , ilf , in the 
sequence corresponding to the state S* . The reason for this is that at the 
beginning of ri the individual is known to be in St and during any other 
time element r* he could remain in St . The number of bags in a sequence 
corresponding to any other state Sf cannot exceed Af — 1 . In fact, whatever 
i ^ 0, the individual cannot be in Sf at the beginning of ti . 

Let qfi{k) denote the proportion of balls in the bag Bf{k) which bear the 
number J, for J = 0 , 1 , 2 , • • • , n. According to what was just said, the pro¬ 
portion qfi{k) represents the conditional probability that, during time ele¬ 
ment Tk , the individual will be transferred from state Sf to state Sf , 
|;iven that he was in Sf at the beginning of t* . It may happen that, for a 
certain number r, the proportion qfr{)^) = 0. Then the bag Bf{k) does not 



76 EVALUATION OF COMPETING RISKS [2*8'2] 

contain any balls marked with the number r and the individual cannot be 
transferred from Sf to S? during time r* . Obviously, the probabilities 
must satisfy the condition 

qUk) + qUk) + ••• + qUk) = 1 . 

The probabilities qfi(k) are sometimes called transition probabilities. 

With this general model, the study of the fate of an individual during time 
T is reduced to the study of the possible outcomes of M successive draws of 
balls. The final outcome of these draws is characterized by the number on 
the last ball drawn. This ball is drawn from one of the bags The 

number marked on it determines the state of the individual at the conclu¬ 
sion of time T. Thus the absolute probability that the last ball drawn will 
bear the number j represents the probability that the individual will be in 
state Sf at the conclusion of time T. This probability is called the crude 
rate of the risk R*i in time T and is denoted by Qf . In operational terms, it 
tells how frequently an individual starting from state S? will be found in 
state S* at the conclusion of time T. 

This general discussion of the mathematical model will now be inter¬ 
preted in relation to the problem of the outgoing patient. We shall postulate 
only four possible states: So = being alive and free from relapse, Si = being 
dead from causes other than disease D, S 2 — having a relapse of D and 
/S 3 = being dead following a relapse. Thus we shall need four sequences of 
bags, {Bo(fc)} corresponding to So , {Bi(fc)} corresponding to 5, , {BjWl 
corresponding to S 2 and {BjCfc)} corresponding to S 3 . An individual 
assumed to have recovered from D must first draw a ball from bag Bo(l). 
This draw determines how he spends the first time element r, . The cir¬ 
cumstances of the problem imply that the bag and also any other 

bag B„{k) of the same sequence, can include balls with numbers 0, 1 and 2 
but no balls with number 3. In fact, the postulate of elements of time 
excludes the possibility that within the same element of time the individual 
may both suffer a relapse from disease D and die from this relapse. The 
proportions ^ 00 ( 1 ), <?oi(l) and qo 2 (l) of the different balls in bag Boi are the 
probabilities that the individual spends time ti in state So , that he dies 
during t, from other causes than D, and that he has a relapse during ti . 
These probabilities must add up to unity. Similarly the proportions goo(k), 
qoi(k), qo 2 (k) are the probabilities, given that the individual was alive and 
not suffering a relapse at the beginning of t* , that within t* he will continue 
in the same state, that he will die from other causes, and that he will suffer 
a relapse, respectively. 

There is no return from state Sx nor from state S 3 of the dead. Therefore 
the circumstances of the present problem imply that the bags Bi(k) contain 
only one kind of balls, those bearing the number 1 . Also the bag Baik) 
contains only one kind of balls, those bearing the number 3 . Therefore, the 
proportions gu(fc) = gasCfc) = 1, for k = 2, 3, • • • , M. 



A MATHEMATICAL MODEL 


77 


[2-8-2] 

Finally, bags B 2 {k) of the sequence corresponding to state S 2 having 
a relapse of the disease D) include balls with No. 2 and balls with No. 3. 
This corresponds to our simplifying assumptions that there is no recovery 
from a relapse, at least within time T, and that, should death occur during 
a relapse, this will be due to disease D and not to any other cause. The pro¬ 
portion q 2 z{k) represents the probability of death during r* conditional 
upon the assumption that at the beginning of this time element the indi¬ 
vidual was suffering from a relapse of the disease Z). The proportion 
q22{k) = 1 - q2^{k). 

The entire model may be represented diagrammatically by a sequence of 
letters t corresponding to consecutive elements of time and by four se¬ 
quences of letters S referring to different states in which the individual may 
find himself at the beginning of each time element. 


Ti T2 T3 • • • Tjfc Tjfc + i • • • Tm-X Tjvf 



Here arrows indicate the transfers from one state to another which are 
possible in this particular problem. In order to decide on the transfer 
within time element the individual draws a ball from a bag corresponding 
to the particular time element Tk and to the state in which he happens to 
be at the beginning of this time element. The crude rate Q, of any of the 
four risks Roj yj = 0, 1, 2, 3 is the probability that the last of the M draws 
will yield ball No. j. 

It is important to be clear about the fundamental probability sets to 
which the crude rates of risk refer. The reader will have no difficulty in 
establishing that all the crude rates considered in any given problem arc 
probabilities referring to the same fundamental probability set, say 2? , 
composed of all the possible combinations of outcomes of the M consecutive 
draws which determine the fate of the individual during time T, Although 
more complicated, the structure of the fundamental probability set 2? is 
similar to that appropriate to the controversial problem of subsection 
2-4-1, 

* Now we shall discuss the exact definition of the net rate of a given risk. 
As stated vaguely before, the net rate of a given risk, say /Zf,, during time T 



78 


EVALUATION OF COMPETING RISKS [2-8*2] 

means the probability, say P* , of succumbing to this risk within time T 
in artificial conditions where all other risks are eliminated. Interpreting 
this description within the mathematical model introduced, we first notice 
that each probability Pf, refers to a fundamental probability set, say S?, , 
which differs from the fundamental probability set SJV corresponding to 
any other probability Pfr of the same kind. Also, the fundamental prob¬ 
ability set S?, differs from the fundamental probability set 2? . Since Pf, 
is concerned with the risk Pf, during time P, it follows that each element 
of the set 2f, is again a combination of the results of M consecutive draws 
performed by the individual. In this regard, 2f, is similar to 2? . The dis¬ 
tinction between the two sets is introduced by the prescription that, in the 
case of a net rate of risk, the exposure considered be observed '^in artificial 
conditions where all other risks are eliminated.^' In order to give effect to 
this restriction we must consider the sequence of M bags Bf{k) corre¬ 
sponding to state 8f with their contents modified. Namely, in order to 
eliminate all other risks but the risk P* , we consider that all the balls in 
Bf{k) which bear numbers m 9 ^ j are replaced by balls bearing the number 
i. The original bag Bf{k) with its contents modified in this manner may be 
denoted as Pf, (fc). It contains the same number of balls as the bag Pf (fc). 
Also the number of balls marked j is the same so that the probability of 
drawing a ball marked i is the same, namely gf, (/c). However, all the other 
balls in Bfi{k) are marked with an i. 

In order to define the net rate Pf, it is sufficient to consider only one 
sequence of M bags namely Pf,(l), Pf,(2), • • • , Pf,(M). The individual 
is required to draw a ball from each of these bags. If all the balls drawn 
bear the number i, he escapes the risk Pf, during the entire time T. Other¬ 
wise, he succumbs to this risk; the net rate Pf, represents the probability 
of this latter occurrence. This remark leads to an easy computation of the 
net rate Pf,- . In fact, 1 — Pf, is the probability of escaping risk Pf, during 
time Tf that is, during the M consecutive draws from the bags Pf,(A:), 
A; == 1, 2, • • • , M. The multiplication theorem gives 

1 - Pti = n [1 - qmi 

Thus the net rate 

P?, = 1 - ft [1 - 

k-l 

With reference to the particular problem of the outgoing patient we 
contemplated four different states So, Sj, S 2 , S 3 . Whatever the values of i 
and j = 0, 1 , 2 , 3, i j, the net rate of the risk P,,- means the probability 
Pii , computed under the assumption that at the beginning of the firf^t 
element of time an individual is in state S*, that at the end of the Afth ele¬ 
ment of time he will be in state S,-. This probability P<, is to be computed 



[2*8’2] A MATHEMATICAL MODEL 79 

under the further assumption that during all the M elements of time all 
other risks are eliminated. 

It is easy to see that, with the interpretation adopted above, the net rates 

Pio ^ Pl 2 ^ Pis ^ PqS ^ P 20 ^ P 2 I ^ Pso ^ P 3 I = P 32 ~ 0* 

Consider, for example, the net rate P 03 . According to the definition, this is 
the probability that during M consecutive elements of time, an individual 
alive at the beginning of this period and free from disease Z>, will die from a 
relapse in conditions where all other risks are eliminated. If we give effect to 
this restriction, then we shall have to eliminate the risk of relapse. But then 
death after relapse will be impossible. The same result is obtained auto¬ 
matically following the general method described above. In order to com¬ 
pute P 03 we take M bags B^ik) and replace all the balls not marked 3 with 
balls marked with zero. However, none of the bags Bo{k) contain balls 
marked 3. Therefore the described modification of the contents of bags 
Po(A) will fill them with balls marked with zero and no others. Thus, the 
probability of drawing a ball marked 3 from any one of the bags is zero. 

There are three risks with net rates which are not necessarily zero. These 
are Poi ? B 02 > and P 23 • The net rate of Poi is the probability that within 
the M consecutive time elements the outgoing patient will die from causes 
other than the original disease JD, computed under the assumption that 
all the other risks are eliminated. In order to give effect to this restriction 
and to compute the net rate Poi , we consider the sequence of M bags 
Poi(l), Poi(2), • • • , Boi{k)j • • • , Boi{M), Each bag Poi(fc) corresponds to 
the bag B^^k). Bag Poi(^) contains the same number of balls as bag Bo{k), 
Also, the same number bear the nqmeral one. However, all the other balls 
in bag Boi(k) are marked zero, even though bag Bo{k) may contain some 
balls marked 2 . This gives effect to the assumption that the risk of relapse is 
eliminated. The individual draws a ball out of each bag Poi (k). If all the M 
balls dra^vn are No. 0 , then the individual escapes death from causes other 
than D during the whole period of time T. The probability of this is 

ft [1 - «o.W]. 

Otherwise, he dies from these other causes. The probability that this will 
happen is just the net rate Poi and 

i’o. = 1 - ft [1 - go,(fc)]. 

Similarly it is easy to find that 

■Po* = 1 - ft [1 - ?02W], 

ib-1 

P23 ~ 1 n [1 Q2s(k)]- 



80 EVALUATION OF COMPETING RISKS [2*8'2] 

It is worthwhile to emphasize that the fundamental probability sets Soi 
and 2 o 2 , to which the probabilities Poi and P 02 refer, are different. Also each 
of these sets differ from the F.RS. lo to which the probabilities described 
as crude rates refer. If the reader does not appreciate this, he may be puz¬ 
zled by problems of the following kind: in a bombing attack on a specified 
target a bomber may be shot down by flack or by enemy fighters. The net 
rate of the first of these risks (that is, the probability of being shot down by 
flack when there are no fighters) is .55; the net rate of the second risk (that 
is, the probability of being shot down by fighters when there is no flack) is 
.65. However, the property of being shot down by flack and the property of 
being shot down by fighters are exclusive. Hence, by the addition theorem, 
the probability that the bomber will be shot down either by flack or by 
fighters is .55 + .65 = 1.2 > 1 . How can this be? 

This completes the construction of the simplified model of competing 
risks. If the probabilities of transfer g*(k) are allowed to vary with 
fc = 1 , 2 , • • • , M then we say that the intensity of the risk 12?,• varies in 
time T. In many cases this is a very natural assumption to make. However, 
the reader will realize that varying intensity of risks must introduce con¬ 
siderable complications in the model. On the other hand, if the period of 
time T is not very long, one may expect that the changes in the proba¬ 
bilities of transfer will be unimportant. For these reasons our further study 
will be based on. the assumption that the intensity of every risk considered 
is constant throughout time T and, therefore, that g*(k) == gf/ for 
A; = 1 , 2, • • • , Af. With this simplifying assumption we shall deduce for¬ 
mulae for the crude rates of risk involved in the problem of outgoing 
patients. Also we shall establish the relation between these crude rates and 
the corresponding net rates of risks. Although the results thus obtained 
will relate to a single particular problem, the method of reaching these re¬ 
sults is applicable to many other problems. 

In the notation which we have already adopted, Qi , Q 2 , and Q., stand 
for the crude rates and Poi, P 02 , and P 23 for the net rates of risks that an 
outgoing patient will die from causes other than the disease D, will have a 
relapse, and will die following a relapse, respectively. The formulae for the 
net rates of the three risks are obtained directly from those already deduced 
by substituting goi(k) = go <, for i = 1 , 2 , and g 23 (k) = ^23 • We have 

Poi “ 1 (1 9oi)^> 

P 02 “ 1 — (1 ^ 02)^9 

P23 ~ 1 (1 ?28)^- 

Before deducing the expressions for the crude rates of the three risks we 
make a brief digression and remind the reader of certain formulae fronV 
elementary algebra. 



GEOMETRIC PROGRESSION 


81 


[2*8-3] 

2*8 *3* Geometric progression. The term geometric progression is used to 
describe a finite or infinite sequence of numbers which has the property that 
the quotient of each number divided by its predecessor has the same value 
r, no matter which pair of two successive numbers is considered. Thus, if the 
numbers ai, ag, • • • , o<, a,+i, • • • , On form a geometric progression then, 
whatever f == 1, 2, • • • , w — 1, the quotient 



has the same value. The number r is described as the ratio of the geometric 
progression. For example, the sequence 1, 2, 4, 8, 16 is a geometric pro¬ 
gression of five terms with the ratio r = 2. Similarly, 27, 9, 3, 1, ^ is a geo¬ 
metric progression with ratio equal to 

Many problems of probability need formulae relating to geometric pro¬ 
gressions. One of these formulae expresses the tth term of the progression a. 
in a relation involving only the first term and the ratio r. The other 
formula connects the sum of the first n terms of the progression with the 
values of and r. 

In order to obtain the first of these formulae, we apply the definition of 
the geometric progression and write 


OL 2 , 


^3 = ^ 2 ^, 


a, = 

Multiplying these formulae and cancelling the products H f^^^m both 
sides we obtain the desired result 

ai = 


Thus, the ith term of a geometric progression is equal to the first term 
multiplied by the (i — l)st power of the ratio. In particular, we have 

On = o,r . 


In order to compute 

<Sn = Oi + Oj 4" • • • + o,. = 

we again write the column 


n 


2 > 
As-l 


0,2 — QfiT , 


t 





82 


EVALUATION OF COMPETING RISKS [2‘8-4] 

(Ik = dk-iT i 


dn — dn-iT f 

and sum. On the left hand side we obtain — di , The sum of terms on the 
right hand side is (iS„ — a„)r. Thus 

Sn - di = (Sr, - dn) r = (Sn ~ dy~^)r. 

Solving this equation for Sn we obtain the second of the two formulae 
sought, 



It follows that the sum of the five terms of the geometric progression with 
di = 27 and the ratio r = i is equal to 

-S* = 27 = 40i 

2 • 8 • 4. Relations between the net and crude rates of risks. Now we may pro¬ 
ceed to the computation of the crude rates of the three risks involved in the 
problem of the “follow up” of outgoing patients. The simplest crude rate 
to compute is Qo = 1 — Qi — O 2 — Qs • Oo is the probability that the out¬ 
going patient will, during time T, escape both relapse and death from 
causes other than the disease D. In the model adopted, this is the prob¬ 
ability that the ball drawn on the Mth draw will bear a zero. For this to 
occur it is necessary and sufficient that each of the M consecutive draws 
yield a ball marked zero. The multiplication theorem gives 

(2'8’1) Qo ~ (1 ffoi ffoz) • 

Next easiest to compute is the crude rate Q, of death from causes other 
than the disease D. Within our model this is the probability that the last 
of the M consecutive draws yield a ball bearing No. 1. It is easy to see that 
the property “last ball No. 1” is equivalent to the following logical sum of 
exclusive properties: 

(last ball No. 1) s 


(1st ball No. 1) 

+ (1st ball No. 0)(2nd ball No. 1) 

+ (1st 2 balls No. 0)(3rd ball No. 1) 

+.. 




[2*8’4] RATES OF RISKS 83 

+ (1st M - 1 balls No. 0)(Mth ball No. 1) 

M 

= (1st ball No. 1) + E (1st k - 1 balls No. 0)(lfcth ball No. 1). 
*-2 


Applying the addition and multiplication theorems, we obtain 

Qi = P{last ball No. 1} = + 12 (1 - goi - 3 o 2 )‘'' 3 oj 

fc -2 


M 


— 5^01 S (1 5^01 ~ 902)* 

Ar-l 


The sum on the right hand side is the sum of M terms of the geometric 
progression which has its first term equal to unity and its ratio equal to 
(1 — Q'oi — ^[ 02 ). Therefore, using the result of the proceeding subsection, we 
may write 


M 


(f ^01 302 ) — 

*-l 


1 (1 ^01 0^02) _ 

^^01 + 9 o 2 


It follows that 


( 2 - 8 - 2 ) 


< 3 i = - [1 - (1 - 9 oi - 902)"] 

5^01 “T ^02 


Qq\ 

0.01 “h 5^02 


(1 - Oo). 


Now we proceed to the computation of . This is the probability that 
the outgoing patient will survive time T but will suffer a relapse of the dis¬ 
ease D. Also, Q 2 is the probability that the last of the M draws described in 
subsection 2-8*2 will yield a ball No. 2. This particular property of the 
outcome of the M consecutive draws is equivalent to the following logical 
sum of exclusive properties, 

(la^t ball No. 2) - (1st ball No. 


+ (1st ball No. 0)(2nd ball No. 2 subsequent^ 


+ (1st 2 balls No. 0)(3rd ball No. 3 subsequent^ 


••••••• 

+ (1st M - 2 balls No. 0)(Af ^ 1st ball No. 2)(Mth ball No. 2) 



84 EVALUATION OF COMPETING RISKS [2*8*4] 

+ (MM - 1 balls No. 0)(Mtb ball No. 2) 


= (1st ball No 


. 2 )(: 


Af — 1 subsequent\ 
balls No. 2 / 


+ g (Ist k balls No. 0)(k + 1 st ball No. 2 ^ subsequent^ 


+ (Ist ilf - 1 balls No. 0 )(Mth ball No. 2 ). 

Applying the addition and multiplication theorems, we obtain 

M-2 

O2 ^ ^02(1 523 ) ' 4 " 53 (1 5^01 ^^ 02 ) 502(1 523 ) 

k-l 


4“ (1 5oi 502) 5 o2 • 

This formula may be transformed into the following simpler formula 

Q2 = 5 o2 53 (1 ““ ?01 ““ 5 o2)*(1 523)^ 

ifc-O 


The sum on the right hand side is the sum of M terms of the geometric 
progression which has its first term equal to unity and its ratio equal to 


1 5oi 5 o2 
1 ” 523 


Thus, using the results of the preceding subsection, we have 

j _ ^ 1 ~ 9oi 
\Af-l _\ 1 523 / 


02 502(1 -- ^23) 


523 

2 _ 1 5oi 5 o 2 

1 523 


5o2 


5oi 4" 5o3 ““ 523 


[(1 523)^ — (1 5 oi "" 502)^] 


5o2 


5 oi 4 “ 5o2 523 


[1 ~Qo 


P 23 ]. 


Q 3 , the crude rate of the risk of death during time T following a relapse, 
remains to be evaluated. Within the model adopted Qa is the probabilit} 
that the Mth draw will yield a ball bearing the number 3. This property of 



RATES OF RISKS 


[2-8'4] 


85 


the outcome of M consecutive draws is equivalent to the logical «iim of the 
following exclusive properties: 


(last ball No. 3) - (1st ball No. 2){^} o) 

'\M - 1 balls No. 2/ 


+ (... b.n No. ,,(W ban No. ,) 

+ (W 2 ball. No. 0X3,d ball No.2)(^' “ 3 ^“no. 2 ) 


• • • • . • 

+ (1st M - 2 balls No. 0)(M’ - 1st ball No. 

In interpreting this formula the reader will remember that the bar above 
the symbol of a property signifies the negation of this property. The above 
formula may be rewritten as 

(laat ball No. 3) - (la, ball No. “'f^^No. 2 ) 

+ E (la, k balla No. 0)(i + la, ball No. 2 )(^' “'j'^rballa No. 2 )- 

Application of the addition and multiplication theorems gives 
Qa = P{last ball No. 3} 

+ Ed - HM-lr-’?tllaN0.2 ' ‘ “ “> 4 

The conditional probability on the right hand side is easy to compute as 
one minus the probability, given that the k + 1 st ball is No. 2, that all the 
subsequent draws will yield balls No. 2 , 

= 1 - (1 - 923 )"-*“. 



86 EVALUATION OF COMPETING RISKS [2-8-4] 

It follows that 

Os — 502(1 (1 523 ) ] 

+ </o, L (1 - go, - go*)*[l - (1 - g,,)"-*-']. 

k»l 

Easy algebra gives 

Os = 5 o2 2 (1 ~ ^01 “ 502)* “* 502(1 — 523)'*^ ^ 53 r 1 * ^~i 

Jfc-0 A-0 L 1 523 J 


_ 1 ~ (1 ~ 5 ni 502)^^ 

5 oi + 5 o 2 


502(1 — ^23) 


”1 ^ “ 1^-1 

J _ 1 5fti 5 o2 

Af-1 _ L ^ 523 __ 


1 - 


523 

1 5 oi 5 o 2 

1 523 


5o2 

5 oi + 5(12 


[1 - (1 


5 oi 502) ] 


502(1 52.3) 

5 oi * 4 " 5o2 523 


[(1 - 523 )^-^ 


— (1 — goi “ 502 )^ '] 


5 o 2 

5 oi + 5 o2 


+ 


_ 5o2523 _ 

(501 4 " 502) (501 4 " 5«2 523) 


(1 5 oi 502) 


_2 o2 _ 

5 oi 4 " 5o2 523 


(1 ““ 523)^7 


or, using the expressions for Qo and P 23 , 


n _ ^ _p_gogJ-gg_n— 

5 oi 4 " 5o2 523 (501 4 “ 502)(501 4 " 5o2 523) 

Using these formulae the reader will have no difficulty in checking that 
( 2 * 8 * 3 ) O2 4 " O3 = “ ^ (1 ■" Oo)- 

5oi i" 5 o 2 

In order to obtain formulae connecting the net and the crude rates of 
risk, we express 501 , 502 , and gss in terms of theito rates. Using formula 
( 2 - 8 - 1 ) we obtain 


X 

5oi 4" 5 o 3 ~ 1 Oo • 



87 


[2-8-4] 


RATES OF RISKS 


Substituting this result into (2'8'2) and (2- 8 -3), we find 


9 oi — Qi 


1 - Qo 
1 - 0 . ’ 


X 

Qo2 — (Q2 + Qa) 2 _ 


These formulae can now be substituted into the expressions for Poi and 
P 02 . We have 


(2-8-4) Poi = 1 - (1 - goO*' = 1 
(2.8-5) P 02 = 1 - (1 - 902 )" = 1 


[ 

[ 


1 - Qi 


1 - Qo 

1 - Qo 


Af 


f 




ilf 


Since Qo = 1 — Qi — Q 2 — Qa, it is seen that the value of Qi and the value 
of the sum Qa + Qa determine uniquely the net rates Poi and P 02 • In order 
to determine P 23 the values of Qa and Q 3 taken separately will be needed. 
Using the expression for Qa we have 


Since 


QaC^oi “t“ ^02 923) 502(1 Qo Paa)* 


-L. 

523 =1 (1 P23) ) 


this equation can be rewritten as 

Q 2(1 — P23) QaCl 5 oi 502) “ 502(1 Qo) 502P23 • 


Dividing by Qa and substituting the expressions of qoi and qoa in terms of the 
crude rates, we obtain the equation 


( 2 - 8 - 6 ) 


(1 - P23)" 


Qad - Qo^) _ Q2 + Qa 1 - Qo^ ^ 
Q2 Q2 1 - Qo ' 


which determines the value of P 23 . Unfortunately this equation is compli¬ 
cated and no explicit solutions can be given. However, an approximate 
solution may be obtained graphically. 

Let 

y(P) - (1 - P)" 

2 (P) = 1 + 

W2 L V2 ““ Vo J 



88 


EVALUATION OF COMPETING RISKS [ 2 * 8 * 4 ] 

The solution P23 sought is that value of P for which y(P) = z(P), 
Knowing M, a graph of y{P) can be prepared for 0 < P < 1, This is repre¬ 
sented by a curve. Next we use the given values of Qi , Q2 , Qa and 
Qo = 1 Q2 Q3 and compute the two terms in brackets in the 

expression for z(P). Now z(P) can be plotted against P and the plot is a 
straight line. If the plot is made accurately, the point where the line 
touches the curve determines P23 . Unfortimately, with large values of M, 
the graph of y(P) is a very flat curve over most of the range of P and, there¬ 
fore, the graphical method gives but a rough estimate of P23. After the first 
approximation to P23 is obtained, it can be improved by applying the trial 
and error method to formula (2-8*6). 

Formulae (2-8-4) to (2-8-6) can be used to determine the net rates of 
the two direct risks whenever the crude rates Qi , Q2 and Q3 are given. One 
word of caution is necessary. Formulae (2-8*4) and (2*8*5) are valid what¬ 
ever the non-negative values of Qi , Q2 and Qs , provided their sum is less 
than one. However, the value of P23 can be found only if Q2 is greater than 
zero. The reason is that for Q3 to be a positive number it is necessary that 
523 > 0. This requires that Q2 be positive even if 523 = 1- It is interesting to 
notice that when 523 = 1, then the probability that the M consecutive draws 
of our model will end with drawing a ball No. 2 is equal to the probability 
that the first Af — 1 draws all yield balls No. 0 and that these will be fol¬ 
lowed by a ball No. 2. This probability is (1 — goi 5'o2)^~Vo2 • In terms 
of relapse and death, the crude rate Q2 is, in this case, the probability that 
the outgoing patient will stay healthy for Af — 1 time elements and then 
have a relapse in the last time element of the period T. The assumption 
that 523 = 1 implies that in the immediately following time element this 
pMient will die, that is, he will die during r^+i • However, at the end of the 
Afth time element he was still alive. 

The problems related to the above theory may be of two kinds: (a) given 
the crude rates, find the net rates of the risks involved and (b) given the 
net rates, determine the crude rates. Thus far, our attention has been 
concentrated on problems of type (a). However, problems of type (b) are 
easily solved by noticing that, for i, j = 1, 2, 3, i ^ 1, 

5„- = 1 - (1 - P,,)^, 

and then using the expressions for Qo , Qi, Q2, and Q3 in terms of 501,502, 
and 523. 

When Af is large, it is necessary to use logarithms with many decimals to 
obtain reasonable accuracy and the computations involved are impleasant. 
Moreover, when Af is substantial, the final result does not depend very 
much on the value of Af. The reason for this will be apparent after studying 
subsection 4 - 6 • 1 in Chapter 4. 

Some of the problems at the end of this section relating to the evaluation 



[2*8’5] MORE REALISTIC TREATMENT 89 

of competing risks correspond exactly to the pattern discussed in this sec¬ 
tion with two risks Rqi and i 2 o 2 to which the individual is directly exposed 
and with one additional risk R 23 which is contingent on i ?02 • Some other 
problems are simpler and do not involve any contingent risk. In order to 
treat problems of the latter type it is sufficient to put ^23 = 63 = P 23 = 0 - 
In a similar manner one can reduce the number of direct risks to one by 
postulating that the corresponding probability of passage is equal to zero. 
Notice that some of the formulae are collected together in (2* 8 - 8 ) and 
(2-8*9) below. 

Upon inspecting the problems given at the end of this section, the reader 
will realize that the range of application of the theory outlined is extremely 
broad and interesting. Unfortunately, a great many important problems 
(not those given at the end of this section) require a somewhat more compli¬ 
cated scheme than that of the outgoing patient. The simplicity of the latter 
scheme depends essentially upon the fact that the individual can leave the 
state So (and also the state S 2 ) at any time element, but once gone, he can¬ 
not return. As was already mentioned, this detail contributes to the arti¬ 
ficiality of the mathematical model. However, the solution of more realistic 
problems, where returns to the original state are possible, requires more 
mathematical tools than the reader of the present book is expected to have 
at his disposal. 

*2-8-5. Problem of competing risks. More realistic treatment. In the 

treatment of the problem of competing risks we used only those mathemat¬ 
ical tools which usually are acquired in high school. This made it necessary 
to introduce into the model a somewhat artificial concept of an element of 
time of fixed duration r. The* reader will have noticed that the formulae 
deduced do not depend on r directly but only through M, the number of 
time elements in the period T covered by observation. It was noticed also 
that the outcome of the various computations becomes less and less sensi¬ 
tive to changes in M when M is large. In the present subsection we shall 
present a somewhat more realistic treatment of the same problem based on a 
passage to the limit where M tends to infinity while the time element r 
tends to zero. In order to follow the discussion which follows the reader 
needs only the concept of limit and the formula 

(2-8-7) lim (1 + = e- 

where a stands for an arbitrary number and e = 2.71828 • * • is the base of 
the natural logarithms. Both the concept of limit and formula (2-8*7) are 
discussed in subsection 4 - 5 -1 of Chapter 4. We shall limit our study to the 
specific problem of the outgoing patient, that is, to the scheme with four 
possible states So, Si , S 2 , S 3 , with three possible direct risks of transfer 

*Sections or parts of sections preceded by a star are of a more advanced nature and 
may be omitted without breaking the continuity of presentation. 



90 


EVALUATION OF COMPETING RISKS [2*8-5] 

^01 , Rq 2 , and /?23 and with one risk J?o 8 contingent on R 02 • Furthennore, 
we shall assume that the intensity of risks does not change with time. 

We begin by collecting together the formulae for the four crude and 
three net rates of risks deduced in the preceding section 

Qo ~ (1 Qoi 




( 2 - 8 - 8 ) 


^01 + ^02 


<32 = 


Qo2 


Qoi 4 “ Qo2 “■ ^23 


[(1 — $ 23 )^ — (1 — Q02)^]f 


Q, -^ + ----^ (1 _ 

Qoi H” ^^02 ((?0l 4 “ 5^02) (O'oi 4 " Qo2 ^23) 


Q 02 Q 23 


_2o2_ 

4" ^02 Q 22 


(1 $ 23 ) 


Poi —• 1 — (1 — 

(2-8-9) P„, = 1 - (1 - g„,)^ 

P23 = 1 (1 5 ' 23 )^» 

★Here M represents the number of time elements r within the period of 
observation T and qoi , ^02 , and ^23 the probabilities of the transfer indi¬ 
cated by the subscripts during the time element r = T/M. Naturally, the 
smaller the time element contemplated for a given problem, the smaller the 
transition probabilities qoi , qo 2 , and ^23 must be. Also, the smaller t the less 
objectionable is the postulate of elements of time. For this reason we shall 
put 

qoiM = Xi so that 


(2 • 8 • 10 ) qo 2 M = Xa so that 


q 2 zM = X 3 so that 

and shall postulate that the actual phenomenon of competing risks corre¬ 
sponds not to formulae (2*8- 8 ) and (2-8-9) but to the limiting expressions 
of these formulae computed by assuming that ilf —>«> while Xi, Xj, and X 3 
are held constant. 


n 

“ Af » 
_ Xa 

902- 

n —hi. 



MORE REALISTIC TREATMENT 


91 


[2’8-5] 

★In order to distinguish this postulate from the postulate of elements of 
time, we shall label it the postulate of continuity of time. The crude and the 
net rates of risks computed assuming the postulate of continuity of time 
will be denoted by the bold face letters Q andP, respectively. Thus, what¬ 
ever iy j = 1, 2, 3, 

Q, = lim Qi , 

Af-»oo 


Pi, = lim Pi, . 

★Formulae for the net rates of risks, based on the postulate of continuity 
of time are easily obtained from (2-8-9), (2-8-10), and (2-8*7). In all 
three cases the passage to the limit required is exactly of the kind given in 
(2-8*7) with a = —X, and x = M, Thus, we obtain 

P„i = 1 - 

(2-8-11) P„, = l-e-"‘ 

p,3 = 1 _ e-^-. 

Further, substituting (2*8-10) into the formula (2*8*8), we obtain 


Q. = 


Xi 


Xi + X 2 


(1 “ Qo), 


Q 2 = 


Xi + X 2 X 3 




Q _ ^2 j_^^3_ -Xx-Xa__ -X, 

^ Xi + X2 (Xi + X2)(Xi + X2 — X3) Xi + X2 — X3 

★The conditions of most problems involving risks are expressed either in 
terms of net rates or in terms of crude rates. Thus, in order to treat such 
problems it is necessary to obtain formulae linking the two kinds of rates 
without the intermediate constants X. This is easily done by noticing, from 
(2*8*11), that 

Xi = —log. (1 — Poi), 

X2 = “log. (1 - P02), 


Xa = “log. (1 - P 23 ). 



92 EVALUATION OF COMPETING RISKS [2'8"5] 

Here log. indicates logarithms to the base e where, for any x, 

= Jog'»/g 
log.o e 0.43429 ‘ 

Using these formulae, we obtain 

Qo = (1 ”” Poi)(l Po 2 )> 

( 2 - 8 ' 12 ) 

O ^ ■ _ log, (1 - Pna) _ 

log. (1 - PoO + log. (1 - Po.) - log. (1 - P,3) 

• [(1 - P^a) - (1 - Po.)(l - P 02 )], 


Qa — 


_ log. (1 - P 02 )_ 

log. (1 - Poi) + log. (1 - P 02 ) 


[1 - (1 - PoO(l - P 02 )] - Q 


★Thus, if the net rates are given, the corresponding crude rates can be 
obtained from equations (2‘8* 12). In order to obtain formulae appropriate 
for computing the net rates of risks when the crude rates are given, for¬ 
mulae (2-8-12) must be solved with respect to the net rates. First observe 
that 


Q 2 4“ Qa 


log. (1 - P 02 ) 


log. (1 - Poi) + log. (1 - P 02 ) 


[1 -- (1 - PoOd - P 02 )]. 


An easy combination of this result with formulae (2*8*12) gives 

Po, = 1 - 


P 02 = 1 - . 

★As was the case in the preceding section, no explicit formula is given to 
compute the net rate P 23 . The reader will have no difficulty in finding the 
equation 

- log. (1 - P„) - § log. Q. - log. Q.)p„ 

which determines P23 in terms of the crude rates Qo, Qa, and Q3 . As in 
subsection 2-8-4, we set 

Y(P) = -log. (1 - P) 


“ (q! ((?-Qo)Qa 



[2-8’5] PROBLEMS AND EXERCISES 93 

The plot of y(P) against P, for 0 < P < 1 , can be made once for all. Then, 
having the values of the crude rates we compute the two terms in brackets 
in the expression of Z(P). The graph of Z(P) is a straight line which touches 
the graph of Y(P). The common point of the two graphs corresponds to 
the value of P = P 23 . 

PROBLEMS AND EXERCISES 

1. It may be considered that a bombing plane during a sortie is exposed 
to the following competing risks: Rqq of returning safely to the base, Roi of 
being shot down by flack and R 02 of being shot down by enemy fighters. 
Assume that the length of combat during a sortie is T = 12 minutes and 
postulate the element of time r = 1 second. For a given target, let the net 
rates of the two risks be Poi = -b and P 02 = -7. 

(a) Give an interpretation of the probabilities Poi and P 02 in operational 
terms, (b) Compute how frequently a bomber sent against this target will 
return safely to its base, (c) Define and compute the crude rates Qi and Q 2 
of the two risks. Partial Answer: (b) 0.1198; (c) Qi = 0.381, Q 2 = 0.500. 

2 . Assume that during a bombing mission a bombing plane is exposed to 

the following direct risks: Poo of returning safely to the base, Poi of being 
shot down by flack, P 02 of being damaged by flack. Besides, there is a risk 
Po 3 , contingent on P 02 , of being shot down by fighters after being damaged 
by flack. Notice that we assume that the risk of being shot down by fighters 
without prior damage by flack may be ignored. Assume that a long series 
of sorties of the specified type have established the following crude rates of 
the three risks: Qi = . 1 , Qg = -3, Q 3 = . 1 . Give the definition of the crude 
and of the net rates of the above risks. Compute the net rates of the three 
risks using M = 200. Answer: Poi = 0.13, P 02 = 0.43, P 23 = 0.5. 

3. In the general conditions of Problem 2, assume that the values of the 
net risks Poi , P 02 , amd P 23 are as specified below. Further, assume that in 
the time available before an impending attack it is possible either to (i) in¬ 
crease the intensity of flack so that the net rates are increased to P^i , and 
P 02 , respectively, or to (ii) increase the fighter defense so that the net rate 
of the third risk is at the level PgJ . Assume that it is impossible to achieve 
(i) and (ii) simultaneously. Imagine that you are to give advice on the best 
method of defending the target. What are the probabilities that are par¬ 
ticularly relevant? Which of the two policies (i) and (ii) would you recom¬ 
mend? Why? Your explanation should be based on the results of the 
relevant computations. In order to obtain numerical results, use the postu¬ 
late of elements of time with M = 200. 

(a) Let Poi = .2, P 02 = .4, P 23 = *05, PU = .4, P02 = .6, and P 23 = .25. 

(b) Let Poi ~ Af P 02 ~ .Oy P 23 ~ *7, Poi “ .2, P 02 ~ *7, and P 23 ~ * 8 . 

Partial Answer: (a) (ii), (b) (i). 



94 


EVALUATION OF COMPETING RISKS [2‘8‘5] 

Answer the same questions as in Problem 3 but substitute the postu¬ 
late of continuity of time for the postulate of time elements. 

5. In 1899 the government of Shangri-la became very mindful of the 
health of its citizens and instituted complete health surveys to be carried 
out on January Ist of each year. The first two surveys established that out 
of all the citizens aged 40 who did not have any sign of cancer on January 1, 
1900, exactly 24.8 per cent contracted this disease during the following 
twelve months and exactly 49.6 per cent died from other causes. The corre¬ 
sponding figures obtained 50 years later are 25.5 and 28.1 per cent, re¬ 
spectively. The government of Shangri-la was pleased with the decrease 
in the mortality from causes other than cancer (from 49.6 to 28.1 per cent) 
but was alarmed by the contrasting slight increase in the incidence of cancer 
(from 24.8 per cent to 25.5 per cent). 

Treat this situation as a problem of competing risks of intensities con¬ 
stant during the periods of 365 days. What are the risks involved? Which 
are direct and which are contingent risks? Give the numerical values of the 
crude and the net rates of these risks. Which of them are directly relevant to 
the problem facing the government of Shangri-la? Is there any special 
reason for alarm? 

6. In Shangri-la, girl students registering in the State University are 
exposed to the following competing direct risks: Roi of losing interest in 
their courses so that they fail and are expelled from the University and 
Rot of getting married. In addition, those girl students who succumb to 
risk Rot are exposed to the contingent risk Rot of being forced to take a full 
time job (and hence abandon further studies) in order to support their 
husbands. The crude rates of the three risks are Qi = .3, Q2 = -1, Qs = .3. 
All these rates relate to the period of T = 4 years and you may take one 
week as the element of time. How frequently does it happen that within 
four years following marriage the economic position of the husband be¬ 
comes so bad that he has to rely on his wife’s full time employment? 

7. After solving Problem 6 compute what would have been the relative 
frequency (is this the crude or the net rate of risk?) of women students’ 
leaving the University following marriage, assuming that there are no 
failures due to loss of interest in studies and that the intensities of the other 
risks remain the same as in Problem 6. 

8. Consider a problem of two competing risks Roi and R 02 so that the 
possible states axe So, Si, and Sa with probabilities of passage independent 
of time, 0 < goo < 1» 0 < ?oi < 1,0 < go* < 1, and gn = g** = 1. In other 
words, during each element of time the individual in So can continue to 
stay in So, can be transferred into Si or can be transferred in S*. However, 
once he is in Si or in S* he must remain in this state indefinitely. Put 
M = 200 and solve the following problems: 

(a) Invent a practical problem corresponding to the above scheme. De¬ 
duce formulae giving the net rates of the two risks Roi and Rot in terms of 



[2*8-5] PROBLEMS AND EXERCISES 95 

crude rates Qo ,Qi , and Q* • Substitute Qx = .3, = .4 and obtain the 

corresponding values of the net rates. 

(b) It is sometimes contended that the net rate Poi of risk Poi coincides 
with the relative probability, say , that an individual starting from So 
will be found after time T in , given that he is not in So. Compute the 
probability iti in terms of crude rates of risks and determine whether or not 
the above contention is true. What is the F.P.S. to which tti refers? What 
is the F.P.S. to which Poi refers? 

9. In order to judge the effectiveness of a treatment of tuberculosis a 
hospital tries to "follow up” outgoing patients. It is considered that during 
the period T of one year each outgoing patient is subject to the following 
competing risks of constant intensity: Roi = being dead from causes other 
than tuberculosis, R 02 — escaping from observation by moving from his 
original home without leaving a forwarding address, and R 03 = suffering a 
relapse of tuberculosis (with or without subsequent death). Prolonged 
study gave the following crude rates of the three risks Qi = .05, Q* = .40, 
Qa = .10. Take the element of time as one day, deduce the formulae con¬ 
necting the net rates with the crude rates of the three risks and obtain the 
value of the net risks of relapse P 03 • 

10. A group of patients who received treatment Ti for disease D were 
observed once a week for four years. The records show that Qi, the crude 
rate of death from other causes, is .01; Qa, the crude rate of relapse of D, 
is .2; and Q 3 , the crude rate of death after relapse, is .1. Similar records for 
another group of patients who received treatment T 3 give: Q, = .15, 
Qa = .15, Qa = .15. However, the living conditions of the second group 
were much poorer than those of the first. Notice that Q2 + Qa , the total 
rate of relapse, is the same. You are a patient suffering from disease D. 
Assuming that living conditions do not affect the net rate of relapse, would 
you request treatment Ti or Tt ? Why? 

Partial Answer: P 03 = 0.30 and 0.35. 



CHAPTER III. 


Probabilistic Problems of Genetics 


3*1. Outline of the laws of heredity 

3*1-1. Introduction. The modern theory of hereditary phenomena, 
known as genetics, developed from the discovery of simple laws by Gregor 
Mendel about one hundred years ago. These laws are essentially prob¬ 
abilistic: i.e., are expressed in terms of relative frequencies, and offer a 
very interesting and important field of application of the theory of prob¬ 
ability. Although, by now the elements of genetics form an essential part 
of general education and may be familiar to the reader of this book, a 
brief summary is likely to be useful. As with all observable phenomena, 
the hereditary processes are complex and the permanencies detected are 
subject to many exceptions. Since the purpose of the present summary 
is to furnish illustrative material for the theory of probability, this sum¬ 
mary is of necessity schematic and simplified, and is limited to a few 



Figure 5. Cell with two chromosomes A Figure 6. Same"^cell with broken chromo- 
and B, somes. 


permanencies elevated to the rank of axioms, with the exceptions ignored. 
The reader will realize that a more realistic treatment of the problems of 
genetics would lead to modified probabilistic schemes and to more complex 
problems of probability. 

The organism does not inherit its external traits directly, but rather 
the capacity to react in a certain way to environmental conditions. The 
capacity for particular reactions is carried by hypothetical entities called 
genes. The genes are located in bodies called chromosomes which are visible 
in the cells of the organism. The number of chromosomes per cell depends 
on the kind of organism: e.g., all human cells have 48 chromosomes. Apart 
from some exceptions, the chromosomes exist in pairs and, therefore, in 
the future we shall invariably speak of a pair of chromosomes as a simple 
entity. - 


96 














LAWS OF HEREDITY 


97 


[3-ri] 

Figures 5 through 10 illustrate the process of formation and fertilization 
of reproductive cells. For simplicity, it is assumed that there are only 
two pairs of chromosomes in each cell. In the male organism we will denote 
these chromosomes by A and B and in the female organism by a and b. 



Figure 7. Recombination of Chromosomes. 


Each chromosome pair may be thought of as a double-barreled tube 
full of balls which represent particular genes. For each trait (or for each 
particular capacity to react) there are exactly two genes carried in one 
or the other of the chromosomes. These genes are located opposite each 



other in the two ‘‘barrels’' and their locations or locus along the chromo¬ 
some is strictly fixed. Figure 5 represents the situation prior to the be¬ 
ginning of the formation of the reproductive cells. The first step in this 
process is represented in Figure 6: each of the two chromosome pairs 



Figure 9. Maternal Reproductive Coll. 


Figure 10. Fertilized Cell. 


may break at some point which varies from one chromosome pair to 
another and from one cell to another. Both barrels of the chromosome 
pair break at the same point. The broken parts of the barrels of chromosome 
pair A are marked An , A 12 , A21 , and A 22 and similarly for chromosomes 























98 PROBABILITY IN GENETICS 

B. In some cases the chromosomes may break into more than two parts. 
However, only the simplest case of breaking is illustrated in Figures 6 and 7. 

Figine 7 illustrates the following stage. It consists of the broken chro¬ 
mosome pairs splitting lengthwise with the sections of single barrels 
floating separately. Gradually, the broken sections of barrels combine into 
pairs and the pairs separate towards the opposite poles of the cell. At 
this stage, described as recombination of chromosome pairs, one of two 
things happens with equal frequencies: either, as illustrated on chromosome 
pair A, the two segments of the same barrel recombine as they were 
before breaking, or, as is illustrated on chromosome pair B, the first section 
of one barrel combines with the second section of the other barrel. The 
final stage consists of the division of the original cell into two reproductive 
cells as in Figure 8. Each reproductive cell has just one barrel of each 
chromosome pair. The fertilization consists in the combination of the two 
reproductive cells, one from the paternal organism and the other from 
the maternal organism. Figure 9 represents one such maternal cell with 
the single barrels of the two chromosome pairs recombined from sections, 
say. On and 0*2, bn and 61*, respectively. Obviously, the breaking points 
of the chromosomes in the maternal cell need not be the same as those in 
the paternal cell. 

Figure 10 illustrates the maternal cell of Figure 9 fertilized by the first of 
two paternal reproductive cells of Figure 8. The corresponding barrels of 
the particular chromosome pairs come together and form ordinary double- 
barreled chromosome pairs. The life of the new organism, the progeny, 
begins with the formation of the fertilized cell. The cell grows and divides, 
producing two identical cells. These divide again and again, etc. 

In the above figures there are ten circles in each of the two barrels of a 
chromosome pair. These circles represent genes, or places or loci which 
may be occupied by genes. Although the actual number of genes in any 
given chromosome pair is not known, it is safe to presume that the number 
is much greater than ten, so that the figures reproduced simplify the 
situation very much. Also, as the student will surmise, the pictures of 
cells and chromosome pairs shown in Figures 5 through 10 bear little 
resemblance to what one sees imder the microscope in actual cells. 

Consider now a particular locus in a chromosome pair, for example, the 
two circles in chromosomes A marked ui and a, . This locus is connected 
with some capacity or property of the organism. We may think, for 
example, of the color of sweet pea flowers. In ordinary conditions this 
color may be either red or white. Thus, there are in existence two different 
genes, say G and g, which may be carried in the locus (aia,). One of these 
genes, g, tends to produce white flowers, the other, 0, red flowers. Should 
a given plant cany two genes of the kind g (that is, ‘‘have the genetical 
composition gg”), then its flowers are white and the progeny of this partic¬ 
ular plant inherit only gene g. If a plant carries, two genes O (that is, 



[3'1‘2J LAWS OF HEREDITY 99 

‘^has the genetical composition GG^^), then its flowers are red and all of 
its reproductive cells carry the gene G, A cross fertilization of the plants 
with a genetical composition of gg and GG, respectively, must produce 
plants with the genetical composition gGj which we denote by writing 
gg X GG = gG. Organisms with two identical genes in a particular locus, 
either gg or GG, are called homozygous. Organisms with two different genes 
gG are called heterozygous or hybrids. In sweet peas the flowers of the 
hybrids are red, as in the homozygous plants GG. This circumstance is 
described by calling G the dominant gene and g the recessive. Also, the 
plants gg are occasionally called pure recessives and GG pure dominants. 

The phenomenon of complete dominance of one gene over another, so 
that hybrids are indistinguishable from pure dominants, is fairly frequent. 
With some traits, however, the dominance is either incomplete or, perhaps, 
nonexistent, and the hybrids can be distinguished from the two homo¬ 
zygous types. In certain cases the hybrid type is intermediate between 
the pure recessives and the pure dominants. The dominant genes are 
usually denoted by capital letters and the recessive by lower-case letters. 

In the case of sweet pea flowers, there are only two genes g and G 
known which may be carried in the particular locus which determines the 
color. In other words, whatever sweet pea plant we consider, this particular 
locus will contain one of the three combinations gg, gG or GG but no other 
gene. This is the simplest case. In other cases there are a number n > 2 
of genes, say , ^2, * * * , , which may combine in pairs and be carried 

in the same locus by different organisms of the same kind. For example, 
the blood groups in man seem to depend on three different genes. We shall 
always consider the simplest case when w == 2. 

3* 1 *2. Notation. In the following we shall consider problems of the type: 
given that the parents (or the grandparents or the great-grandparents) 
possess some particular genetical composition, determine the probability 
that the progeny will inherit a specified combination of genes. For the 
solution of problems of this kind and also for the formulation of axioms 
on which the solution can be based, it is convenient to adopt a special 
notation. Each of the parents and their offspring will be denoted by some 
letter such as M for mother, F for father, and C for the child. If { stands 
for a specified combination of genes, then the assertion that the organism 
M possesses this particular combination will be denoted by M : Thus, 
for example, if g, G and h, H stand for some two pairs of genes, then the 
symbol 

P{C : gG, hH\{M: gG, hH){F : gg, hH )} 

will stand for the probability that the child C will inherit the combination 
gG, hH given that the mother and the father have the combinations 
gG, hH and gg, hH, respectively. 



loo PROBABILITY IN GENETICS [3*1*3] 

In addition to the notation referring to complete organisms My F, and 
C, it will be necessary to use some notation for reproductive cells. We 
shall frequently use the letters X and Y to denote the maternal and 
paternal reproductive cells which combine to produce the child C. For 
example, using this notation, we may write the formula 

Cigg^ (X:g)(Y:g) 

which states the obvious fact that, in order that the child (7 be a pure 
recessive with respect to the pair of genes g, G, it is necessary and sufficient 
that the maternal reproductive cell X and the paternal reproductive cell 
Y both contain the recessive gene g. In other words, the property C : gg 
is expressed as the logical product of the properties X : g and Y : g. 

Similarly, the formula 

C:gG = (X:g)(Y:G) + {X:G){Y:g) 

expresses the fact that, in order that the child be a hybrid gGy it is necessary 
and sufficient that one reproductive cell (it is immaterial which) carries 
the recessive gene and the other the dominant gene. 

3*l-3, Axioms. In Chapter 1 it was emphasized that the role of the 
theory of probability as treated in this book is limited to the calculation 
of the probabilities (= idealized relative frequencies) of certain events E 
from given or postulated probabilities of some other related events, say 
Ay By • • • , C. It follows that, in order to apply the theory of probability 
to genetics, the probabilities of certain basic phenomena must be taken 
for granted (must be postulated). Then these postulates can be used to 
deduce the probabilities of other phenomena. Naturally, in order that the 
deduced probabilities be comparable to relative frequencies observable 
in actual experiments, it is necessary to base the adopted axioms on a 
careful analysis of past observations. From the brief and, of necessity, 
simplified description given in subsection 3 • 1 • 1, the reader will appreciate 
the vastness of the field of hereditary phenomena. Since the main purpose 
of discussing genetics is to provide interesting illustrations of the applica¬ 
tion of probability, it is natural to introduce limitations. Thus we shall 
limit ourselves to studying the following categories of problems: (i) in¬ 
heritance of a single pair of genes, (ii) inheritance of two pairs of genes 
carried in the same chromosome pair, and, mainly in the second volume 
of the book, (iii) inheritance of an arbitrary number of pairs of genes 
with the restriction that no two pairs are carried in the same chromosome 
pair. Needless to say, it is impossible to cover the three categories of 
problems completely. The material chosen for consideration is that which 
seems particularly interesting. For a more complete and more realistic 



LAWS OF HEREDITY 


[3-1-3] 


101 


study of genetics the reader is referred to the books [8] and [7] listed at 
the end of this chapter. 

As we proceed to the formulation of axioms relating to the three cate¬ 
gories of problems mentioned, the reader's attention is called to the fact 
that these axioms must specify the particular phases in the process of 
reproduction which are to be treated as “random," i.e., as unpredictable 
in any particular instance but obeying some law in terms of relative 
frequencies. Also, the axioms must be clear about other phases of repro¬ 
duction which are to bo treated as predetermined or nonrandom. Re¬ 
viewing the description given in subsection 3-l«l, the reader will notice 
that the axioms must be concerned (a) with the formation of reproductive 
cells and (b) with fertilization. 


(a) Axioms Relating to Reproductive Cells. 

Axiom 1. At the moment of fertilization each of the parental organisms 
contains an even number, say 2n, of reproductive cells ready to be fertilized. 
These 2n cells are produced by the division of n parental cells, each of which 
divides into two reproductive cells. Each reproductive cell produced contains 
exactly one barrel of each chromosome pair. The genes carried in the repro¬ 
ductive cells are those present in the parental organism and no others. (Here 
we ignore the important phenomenon of mutation). If in a given locus of 
a chromosome pair the parental organism carries two identical genes, then 
both reproductive cells produced in a given instance carry the same gene. How¬ 
ever, if in a given locus of a chromosome pair the parental organism carries 
two different genes, say g and G, then one of the reproductive cells resulting 
from the subdivision of a parental cell carries gene g and the other carries 
gene G. In this case, of the 2n reproductive cells present at the moment of 
fertilization, exactly n carry gene g and exactly n carry gene G. 

Axiom 1 enumerates the nonrandom phases of the formation of repro¬ 
ductive cells. 


★Axiom 2. When a parental cell divides to produce two reproductive cells, 
the breakage of a chromosome pair and the subsequent recombination are 
random. For every two loci in a chromosome pair, there exists a fixed prob¬ 
ability W that at least one break will occur between these loci. The value of 
the probability W depends m the two loci considered and varies from zero 
to unity. If the two loci are close, then W is small. If the two loci are distant, 
then W is large. Given that at least one break occurred between the two loci 
carrying genes (ai , ^2) dml (fii , ^ 2 ), respectively, the two reproductive cells 
produced may carry either the combinations of genes 


(«! , jS,) and (aj , |8*) 



102 PROBABILITIES OF INHERITANCE [3*2*1] 

or the combinations 

(pti , jSa) and (a* , /SO, 
respectively. The probability of either possibility is one-half. 

Axiom 3. The genetical composition of any given chromosome barrel in a 
reproductive cell is completely independent of the composition of all other 
chromosome barrels of the same cell. 

(b) Axioms Relating to Fertilization. 

Axiom 4. Fertilization is random and can be thought of as consisting 
in two random selections performed by the forthcoming organism C. Let the 
number of reproductive cells contained by the mother M be 2n' and that con¬ 
tained by the father F be 2n". The forthcoming organism C selects one repro¬ 
ductive cell from M, and the probability of selecting any particular edl is 

i • Then C selects one reproductive cell from F arid the probability of 

selecting any particular cell is • The two selected reproductive cells combine 

to produce the first cell of C. 

Axiom 5. The genetical composition of the reprodudive cell selected by C 
from F is independent of the genetical composition of the reproductive cell 
selected from M. 

In some sections of scientific literature (old-fashioned), including some 
dictionaries, the term axiom is used to denote a “self-evident proposition 
which does not require any proof.” The reader will notice that there is 
nothing self-evident in the above five axioms. Moreover, some of them are 
couched in nonrealistic terms and, on closer examination of experimental 
material, may easily be proved inadequate or false. In this connection, the 
reader is referred to Chapter 1, where it was emphasized that the axioms 
underlying the mathematical treatment of any observable phenomena are 
no more than relatively brief and categorical summaries of permanencies 
which seem to be established at a given moment. The purpose of formu¬ 
lating axioms is to provide a systematic basis for drawing conclusions 
which form the theory of a given class of pheiiomena. If these conclusions 
are consistent with observation to a degree satisfying a research worker, 
then he will judge the system of axioms satisfactory. Otherwise, he may 
imdertake a revision. 

3*2. Probabilities of inheritance from parents 

3'2'1. Inheritance of a single pair of genes. Let g and 0 be a single 
pair qf genes carried by an oiganism in a given locus of a chromosome 
pair. An organism of this kind must carry one of the three possible combi- 



ONE PAIR OF GENES 


103 


[3-2-1] 

nations of these genes, gg^ gG or GG, Let ri and f each stand for any of 
these combinations. Our problem in this subsection is to compute prob¬ 
abilities of the type P{C : { | (M : v)(P • ()] that the progeny C will 
inherit the composition f given that the genetical composition of the 
mother M and father F is 7/ and f, respectively. In computing these prob¬ 
abilities we shall use the five axioms of subsection 3* 1*3. It is important 
to be clear which of these axioms are relevant. 

Since we are concerned with only one pair of genes it is obvious that 
Axioms 2 and 3, dealing with the breakage of a particular chromosome 
pair and with independence of several chromosomes, respectively, are ir¬ 
relevant. On the other hand, the computations will depend explicitly on 
Axioms 1, 4, and 5. 

Let X and Y denote the ref^roductive cell selected by C from the mother 
M and father F, respectively. Using the notation introduced in sub¬ 
section 3 • 1 • 2, we may write 

C:gg = (X : g){Y : g), 

C:gG ^(X : g){Y : G) + {X : G){Y : g), 

C : GG = (X : G)(Y : G), 

and it follows that, whatever the genetical compositions rj and f of the 
mother and father, 

P{C : gg \ (M : v)(F : f)} 

( 3 - 2 . 1 ) 

= P{ix : g){Y :g) \{M: vW : f)}, 

P{C :gG\iM: ri){F ; f)) 

(3.2*2) 

= P{(X : g)iY :Cf) + iX :G)(Y : g)\{M : ,)(P : r)}, 
P{C :GG\iM : r,){F : f)} 

(3*2*3) 

= P{(X : G){Y :G)\{M: ,)(P : f)}. 

It will be noticed that in writing these formulae we use that part of 
Axiom 4 which asserts that the genetical composition of C is determined 
by the genes carried in the reproductive cell X selected from M and in 
the reproductive cell Y selected from F. Using Axiom 1, namely, the part 
that asserts that a reproductive cell may carry either gene g or gene G 
but not both of them, we conclude that the properties {X \ g){Y :G) and 
(X :G){Y :g) are exclusive. Therefore, the application of the addition 
theorem on probabilities to formula (3*2*2) gives 



104 PROBABILITIES OF INHERITANCE [3'2'1] 

P{C:gO\(U'- v)(F : r)}. = P{(Z : g)iY :G)\(M: ,)(F : f)} 
(3-2-4) 

+ P{iX : G)(Y :g)\(M: „)(F : f)}. 

The next step is to use Axiom 5, which asserts that the genetical composi¬ 
tion of the male reproductive cell is independent of the genetical compo¬ 
sition of the female reproductive cell. Furthermore, Axiom 4 implies that 
the genetical composition of any parental cell depends only upon the 
genetical composition of the particular parent. With all this taken into 
account and with the use of the multiplication theorem, formulae (3-2 -1), 
(3-2*2) and (3•2-3) can be rewritten as follows: 

P{C :gg 1(M : i,)(F : r)l 

(3-2-5) 

== P{X : g \ M : v] P{Y : g \ F : 

P{C :gG\(M: r,)(F : f)) 

(3-2-6) = P\X : g \M : r,} P{Y : 0\F : l:} 

+ P{X:G\M: n]P{Y:g\F: r}, 
P[C ; (?(? i (M : ,)(F : f)} 

(3-2-7) 

= P[X : G \ M : v} P{Y : G \ F : 

These formulae show that, in order to calculate the probability of any 
given genetical composition of C it is sufficient to compute the prob¬ 
abilities that the two reproductive cells X and Y will carry a specified gene. 

With respect to the pair of genes gG, the composition of M may be 
either gg, gG, or GG. If M : gg, then Axiom 1 asserts that every repro¬ 
ductive cell of M will carry gene g. Axiom 4 implies then that 
P{X : g \ M : gg\ = 1. If Af : gG, then Axiom 1 asserts that one-half of 
the reproductive cells of M will carry gene g and the other half will carry 
gene G. Then, Axiom 5 implies that P{X \ g\M : gG] = §. Finally, if 
M : GG, then, according to Axiom 1, none of the reproductive cells in M 
carry gene g and it follows that P{X ‘.g\M : GG} = 0. Repeating this 
reasoning with respect to gene G, we easily obtain the following set of 
probabilities relating to the maternal reproductive cell: 

(3-2-8) P{X : g\M : gg] = P{X : G \ M : GG} = 1, 

(3-2-9) P{X:g\M :gG} = P{X : G \ M : gG} = 

(3-2-‘l0) P{X : j/ I M :(?(?} = P{X : G \ M : gg} =0. 



[3-2-3] ONE PAIR OF GENES 105 

The reader will perceive that the same probabilities will be obtained 
for the paternal reproductive cell Y in relation to the same genetical 
compositions of F, Using formulae (3*2*5) through (3•2-10) the reader 
will have no difficulty in computing the probabilities of the three possible 
genetical compositions of C for any of the possible combinations of genetical 
compositions of M and F, as set up in Table 3*1. 

Table 3 • 1 


Prohahiliiies of Inheritance of a Single Pair of Genes 


\ 

Father 

\ 

Mother \ 

gg 


gO 

GG 


no = gg 1 

= 1 

1/2 

0 

gy 

P{C = gG\ 

= 0 

1/2 

1 


II 

= 0 

0 

0 


P{C = gg 1 

= 1/2 

1/4 

0 

gG 

P[C = gG\ 

= 1/2 

1/2 

1/2 


P{G = GG\ 

= 0 

1/4 

1/2 


P{G = gg 1 

= 0 

0 

0 

GG 

P\C = gG\ 

= 1 

1/2 

0 


P\C = GG] 

= 0 

1/2 

1 


3•2*2. Inheritance of traits depending on a single pair of genes. If 

gene G completely dominates gene g, then the progeny of any cross will 
appear tc^carry either the recessive trait, say R, or the dominant trait, 
say D = R. The probability that the child C will be an /2(= will carry the 
recessive trait R) is identical with the probability of C : gg. On the other 
hand, for C to be a D it may be either C : gG ov C : GG. Therefore, what¬ 
ever be the composition of the parents 

P{C = D\ MF} = P{C : gG | MF\ + P{C : GG | MF}. 

In particular, using the table of probabilities, Table 3*1, it is easy to find 
that 


P{C^D\{M:gG){F:gG)} = J + i = f. 

3*2*3. Relative probabilities of stated genetical compositions given the 
dominant trait. In treating certain problems we will need the prob- 





106 


PROBABILITIES OF INHERITANCE [3-2*4] 

abilities P{C\gO\MF{C = D)] and P[C:GG\MF{C = D)] that a 
progeny of a given cross ilf X f is a hybrid and that it is a pure dominant, 
given that it carries the dominant trait D. Obviously, these probabilities 
have a meaning only if the parental genetical compositions M and F do 
not preclude the child from carrying the dominant trait D, i.e., when 
D is not impossible. The two probabilities mentioned are obtained by 
direct application of the theorem on relative probabilities. Since D = R 
and because gG and R and also GO and R are exclusive, we have 

Titfi, fi I ]\fjr(n n'l) P{C : gG \ MF] 

PIC . gG I MF{C - D)] - “J = 72 | MF} ' 


P[C : GG I MF{C = D)} 


P\C : GG I MF] 

1 - P{C = R \ MF\ ‘ 


In particular, if both parents are hybrids, then 


P[C :gG \(M: gG)(F : gG)(C = D)} = f, 


P{C :GG\iM : gG)(F : gG)(C = D)} = i 

which means, in effect, that out of all the dominant looking progeny of 
two hybrids, two-thirds are hybrids and one-third are pure dominants. 


^3*2*4. Inheritance of several unlinked pairs of genes. Consider m pairs 
of genes giGi , giGi , • • • , gmGm and assume that no two of these pairs 
are carried in the same chromosome pair. In this case the m pairs of genes 
are called unlinked. Let 77, and f be three arbitrary genetical composi¬ 
tions with respect to these genes. In this subsection we shall consider the 
problem of computing the probability P{C : f | (M : t/XF : f)}. For this 
purpose we shall present 77, and f as logical products of m factors. The 
definition of { must specify the combination of genes of the first pair 
Pi , Gi which the organism C is supposed to carry. Let this combination 
be denoted by {1 . Thus stands either for pipi or for giGi or for G,Gi . 
Similarly, when { is specified, something must be said about the second 
pair of genes g 2 , G 2 • This particular prescription will be denoted by {2 • 
In general, (k will denote the particular combination of genes forming the 
fcth pair mentioned in the definition of for k = 1, 2, • • • , m. Thus the 
property C :{ of the child can be presented as a logical product 

c : { = n (C : (,). 

★For example, 5 may stand for giGi , gsGa, • • • , p«G« . Then for every 
fc = 1, 2, • • • , m, the meaning of is . Proceeding in a similar way 
we represent the genetical compositions of the two parents as logical 



UNLINKED GENES 


[3*2-4] 


107 


products of m genetical compositions each relating to a separate pair of 
genes, say 

M : 1! = f{{M n,) and F : f = ft : fO- 

k^l jb-l 

★After these preliminary remarks, the computation of the probability 
P{C : f I (M : ri){F : ^)} is easily completed by reference to Axioms 3, 4, 
and 5 and to the results of subsection 3*2* 1. The three axioms imply that 
the inheritance of any combination of genes carried in one chromosome 
pair is completely independent from the inheritance of genes carried in all 
other chromosome pairs. Therefore, by the multiplication theorem, 

P{C:^\{M: v)(F : f)} = p{ ft (C : {*) | (M : ri)(F : r)| 


(3-2-11) 


= ft : f* I (M : „)(F : f)} 


ifc-l 


^ YlP{C : ^.\{M : v.)(F: f*)}. 

fc-1 

m 

★Thus, whatever be the genetical composition ? = FI with respect 

to the m unlinked pairs of genes giGi , ^^ 2 ^ 2 , • • * , , in order to compute 

the probability that the child will inherit { when the two parents have 

m m 

the specified genetical composition rj Y[ Vk and f = FI f* , respectively, 

jfc-i ifc-i 

the easiest way to proceed is to treat the m pairs of genes separately, as indi¬ 
cated in subsection 3 *21, and to multiply the resulting probabilities. 

★In order to illustrate this result we shall compute the probability 

FJC : g,G, , gA , g^Oz , G,G, | ft (M : (/,G,) ft (F : 3 * 0 *)} that the 

»•= 1 fc »1 

progeny C of the cross between two quadruple hybrids, 

M : QiGi , g2G2 , g^Gs , 5^404, 


P • giGi , ^^ 2^2 , gsGs , g4G^ , 

will be a hybrid with respect to the first two pairs of genes, a pure recessive 
with respect to the third and a pure dominant with respect to the fourth. 
On the assumption that the four pairs of genes are unlinked we refer to 
formula (3*2*11) and compute the probabilities relating to each pair of 
genes separately, using Table 3*1. 

P{C : g,G, 1 (M : g,Gd{F : g.Gd} 


= P{C : g2G2 I (M : g2G^{F : ^2f?2)} = h 



108 


PROBABILITIES OF INHERITANCE 


[3-2-5] 


P[C : \ {M : g^G,){F : gM = 


P\C : G,G, I (M : gJGW : = i- 


Multiplying these results we obtain 



giGi , 5 ^ 20^2 ( > GtGi 


n {M : n 


t-1 


ib-1 



1111 = ± 

2 2 4 4 64 ’ 


'^3*2'5. Two pairs of linked genes. Probabilities relating to reproductive 
cells. If two (or more) pairs of genes are carried in the same chromosome 
pair, then they are called linked. Consider two linked pairs of genes, and 
denote by W the probability that in the process of the formation of the 
reproductive cells, at least one break of the chromosome pair will occur 
between the loci of these genes. For the purposes of this subsection we 
shall abandon the convention of using capital and lower-case letters to 
denote the dominant and recessive genes, respectively. Instead, we will 
use capital and lower case letters to distinguish between the genes carried 
in the two barrels of the chromosome pair. Arbitrarily denote one of the 
barrels as the first and the other as the second. Then the two genes carried 
in the first barrel will be denoted by Qx and , and the two genes carried 
in the second barrel by Gi and G 2 , irrespective of whether these genes are 
dominant or recessive. 

★Consider one of the parental organisms, e.g., M, and denote by X 
the particular reproductive cell of M which the progeny C inherits. In 
this subsection we shall compute the probabilities, determined by the 
genetical composition of ilf, that X will have a specified genetical compo¬ 
sition. The results obtained will also be applicable to the reproductive 
cell Y inherited by C from the father F. 

★We begin by noticing that we are faced with a new problem only when 
ilf is a hybrid with respect to both pairs of genes considered (when M is 
a ‘‘double hybrid,’^ for short). In fact, in the other two cases, when ilf is 
a “double homozygous” or when ilf is homozygous with respect to one 
pair of genes and hybrid with respect to the other, the problem of in¬ 
heritance of the two pairs of genes is immediately reduced to the problem 
of one pair of genes. When ilf is a double homozygous, i.e., when both 
letters Qi and Gi represent the same gene, say a, and when both g 2 and 62 
represent the same gene, say we have ilf : aa, fil3 so that Axioms 1 
and 4 imply that the only possible genetical composition of X is X : a, p. 
Thus, in this case, P{X : a, /3 | ilf : aa, = 1 and the probabilities of 
all other genetical compositions of X with respect to the two pairs of 
genes considered equal to zero. 




LINKED PAIRS OF GENES 


109 


[3-2-5] 

★Consider now the case when M is homozygous with respect to the 
first pair of genes and heterozygous with respect to the other. Then the 
letters gi and Gx represent the same gene, say a, but the letters g 2 and G 2 
represent two different genes. Thus, we have M : aa, g 2 G 2 . In this case, 
Axioms 1 and 4 imply that X can have but two different genetical compo¬ 
sitions, namely either a, g 2 or a, G 2 with probabilities 
P[X : a, g 2 \ M \ aa, g 2 G 2 } = P{X : a, G 2 \ M : ^2^2} ~ I- The prob¬ 

abilities of all the other genetical compositions of X with respect to the 
same pairs of genes equal zero. 

★Let us now turn to the last possibility, namely that M is a double 
hybrid. This presents a problem essentially different from those considered 
in the preceding subsection. Now (/i , Gi , gfz, and G 2 all represent different 
genes. Consequently, X has one of the four following genetical composi¬ 
tions: 

Qi y Q 2 f Gx j G 2 f g\ j G 2 ] Gx , g2 » 

★Consider the probability that X will have the first of these composi¬ 
tions. This probability P[X :gx , g 2 \ can be considered as that of the 
logical product of two properties of X, that X will contain gene gx from 
the first pair of genes and that it will contain ^2 from the second pair. 
Applying the multiplication theorem we write 

P[X •.gi,gt\M-. giGi , giGt} 

= P{X-.g,\M-. g,G^, gA] P{X : g. | (M : g,G,, g,G^){X : gO). 

★The first factor in the right-hand side is the absolute probability that 
a reproductive cell of an organism hybrid with respect to gx , Gx will contain 
gene gx . As we have seen before, this probability is equal to one-half. 
The second factor in the right-hand side is the relative probability, given 
X : gx , that X will contain the gene g 2 carried by M in the same chromo¬ 
some barrel as gx • Using Axiom 3 we notice that this can occur in two 
different ways which are mutually exclusive. Either in the course of forma¬ 
tion of the reproductive cell the relevant chromosome pair does not break 
between the loci of the two pairs of genes (the probability of this is 1 — IF), 
or the chromosome pair does break at least once (the probability of this 
is W) and then the gene g 2 joins gx in the process of recombination (the 
probability of this is one-half). The application of the addition and multi¬ 
plication theorems then gives 

P{X :g,\(M: g,G,, g,G,){X : gO} = 1 - + TFJ = 1 - JTf, 

and it follows that 

(3-,2-12) P[X:g,,gAM: g^G,, g,G,] = ^(1 - iW)- 

★The reader will notice that in deducing this formula the premises used 
were (i) that M is a double hybrid and (ii) that the genes g, and ga are 



110 PROBABILITIES OF INHERITANCE [3'2‘5] 

carried in M in the same chromosome barrel. Thus the same reasoning 
can be applied to obtain the probability that X :Gi , O 2 , 

(3-2.13) P{X : G., G, I M : , g^G^] = K1 - W- 

★In order to complete the solution of the problem, there still remains 
for us to compute the probability that X will inherit two genes situated 
in M in two different barrels of the same chromosome. We shall compute 
the probability that X : gr, , Ga • It is obvious that the probability that 
X : Gi ,g 2 has the same value. Proceeding as before we have 

P{X g\ , G 2 \ M giGi , g^aGa) 


^P{X-.gy\M: g,G, , ffaGa} P{X : Ga 1 (M : g,G^ , i7aGa)(X : j?.)}, 


= : Ga 1 (M : gfi, , j/aGaXX : j/.)}- 

★The probability on the extreme right is the relative probability given 
X : gi that the reproductive cell X will contain gene Ga carried by M in 
the chromosome barrel different from that carrying g^ . By Axiom 3 this 
can happen only if in the course of formation of the reproductive cells 
the chromosome pair breaks at least once between the loci of the two 
pairs of genes (the probability of this is W) and if Ga joins gi in the process 
of recombination (the probability of this is one-half). It follows that 

(3-2.14) P{X : Ga I (M : g^G^ , j/aGa)(Z :(/.)} = Wi 

and, therefore, 

P{X gi f Ga 1 M t giGi , J/aGa} = iW. 

By the same method 

(3-2-15) P{X : G,, ?a I M : g,Gt , graGa} = iW. 

★The probabilities (3-2-12) through (3-2-15) are of considerable in¬ 
terest. Since 0 ^ W ^ 1, it is seen that the probability of the reproductive 
cell X carrying a specified combination of two genes situated in the parental 
organism in the same chromosome barrel is at least equal to one-quarter 
and may be as large as one-half. On the other hand the probability that 
the reproductive cell X will carry a specified combination of two genes 
situated in the parental organism in two different barrels of the same chro¬ 
mosome is at most equal to one-quarter and may be as small as zero. For 
this reason, the link between genes carried in the same chromosome barrel 
is called positive, and the link between genes carried in two different 
barrels of the same chromosome pair is called negative. Also to diatingiiiah 
these two situations we occasionally speak of “coupling” and of “repulsion.” 

★In the extreme case when W is close to imity, the link between the 
genes tends to disappear, and they are inherited more or leds as though 



[3-2*5] LINKED PAIRS OF GENES 111 

they were carried in different chromosome pairs. On the other hand, if W 
is very small, then the two genes carried in the same chromosome barrel 
are almost always inherited jointly. 

★Linkage of two pairs of genes introduces a distinction between the 
two types of organisms which are heterozygous with respect to these two 
pairs of genes. Returning to the usual convention of denoting dominant 
genes by capital letters and recessive genes by lower-case letters, let giGi 
and ^26^2 be two pairs of linked genes with W representing the probability 
of at least one break of the chromosome pair between their loci. Further, 
let Hi and H 2 stand for the hybrid organisms obtained from the following 
crosses: 


Hi = GiGi , G 2 G 2 X QiQi , g 2 g 2 9 


H 2 — GiGi , g 2 g 2 X gigi > G 2 G 2 • 

It will be seen that in Hi both dominant genes Gi and G 2 are carried in 
one barrel, while both recessive genes gi and g 2 are in the other barrel of 
the same chromosome pair. On the other hand, in H 2 the dominant Gi 
is ^^coupled’^ in the same barrel with the recessive gene g 2 , while the 
recessive gi is coupled with the dominant G 2 in the other barrel of the 
same chromosome pair. Externally, the two hybrids Hi and H 2 look 
exactly the same. Moreover, the frequency of inheritance by their progeny 
of any one single pair of genes follows the usual laws. However, the in¬ 
heritance from Hi of the combinations of both pairs of genes is not the 
same as the inheritance from H 2 • In fact, due to the positive and negative 
linkage just explained, 

P{X : Gi I Hi{X : G 2 )} = 1 - JIT = P{X : G, | H 2 {X : ^2)}, 

P{X : Gi I Hi{X : ^2)} ^ W = P[X : Gi | H 2 {X : G 2 )}. 

For this reason, when dealing with linked pairs of genes, it is necessary to 
adopt a notation which distinguishes between the two kinds of ‘‘double 
hybrids.^' To do so, we will agree when writing symbols representing con¬ 
secutive pairs of genes, that all the first symbols of each pair of symbols 
will be positively linked genes carried in the same barrel of a chromosome 
pair and also that all the second symbols will be positively linked. Thus, 
the notation for the two hybrids H will be, respectively, 

Hi i Gigi , G2g2 > 


H 2 : Gigi , ^^2^2, 

indicating that the positively linked genes in Hi are Gi and G3, gi and ^2 • 
On the other hand, the genes which are coupled in H 2 are Gi and <72 , 
and G 2 . Because of the distinction between the two types of double 



112 PROBABILITIES OF INHERITANCE [3-2-6] 

hybrids, the total number of genetical types different with respect to two 
pairs of linked genes is equal to ten, as follows: 

(1) } Q 2 Q 2 (?) 9 iQi f 92(^2 (3) Q\Q\ i (^ 2(^2 

(4) QiGi , 92^2 (5) 9i(^i i 92 G 2 (b) 9 iGi , G 292 

(7) ^161 , G 2 G 2 (8) GiGi , g 292 (9) GiGi , g 2 G 2 (10) GiGi , G 2 G 2 . 

3*2*6. Two pairs of linked genes. Probabilities of inheritance from 
parents. Due to the postulated independence of compositions of the 
reproductive cells from the two parents, to compute the probability of 
any one of these ten types in the progeny of a given cross, it is sufficient 
to write down the probabilities of the particular compositions of repro¬ 
ductive cells and to perform the requisite multiplication and, if necessary, 
additions. 

★To illustrate the procedure, we will compute the probabilities that the 
cross C of two hybrids M : giGi , gfaGa and F : giGi , G 2 g 2 will inherit any 
of the ten different combinations of genes. For this purpose it is con¬ 
venient to plan the computation in a 4 X 4 table. First, write the possible 
compositions of the maternal reproductive cell X and their respective 
probabilities in the two marginal rows. Similarly, use two marginal col¬ 
umns to write the possible genetical compositions and probabilities of the 
paternal reproductive cell 7. 


Table 3*2 


Probabilities of Inheritance of Two Pairs of Linked Genes 





M : giGi , gr2(?2 




• 

X : gijQt 

X : g„Gt 

X : Gx,gt 

X : G„Gi 




i(i-W) 

w 

\w 

id-ilT) 



iW 

(1) 

• (2) 
Air* 

( 4 ) 

Air* 

( 5 ) 

ilTd-JlT) 

F ; 
gyQu 

Y-.g„02 

J(l-W 

(2) 

id-w)* 

( 3 ) 

iira-lir) 

(6) 

iira-iiT) 

( 7 ) 

id-ilT)* 

Y:Ot,g, 


( 4 ) 

i(l-W)* 

(6) 

iir(i-iTr) 

(8) 

lira-iir) 

( 9 ) 

id-ilT)* 



iW 

(6) 

iTTCl-W) 

( 7 ) 

Air* 

( 9 ) 

Air* 

(10) 

lird-ilT) 




















[3’2'6] LINKED PAIRS OF GENES 113 

★Next, fill in the body of the table with the products of probabilities 
in the marginal column and row. These products represent the probability 
that the progeny C will combine from the corresponding reproductive 
cells. We now mark the 16 combinations of the reproductive cells according 
to the ten genetical types of the progeny. For this purpose the right- 
hand corner of each cell in the table may be used. Adding up the prob¬ 
abilities in the table which correspond to each genetical type, we obtain 
the final results 

P[C : , (72^2} = P{C : g,g, , G,G,] = P[C : G,G, , g,g,] 

= P[C : G,G, , G,G,] = | | w), 

P{C : g,g, , g,G^} = PIC : g,G, , g,g^} = P{C : G,G, , g,G,\ 

= P{C : g,G, , G,G,\ = | | wj, 

P{C : ffA , g^G,} = P{C : g^G, , G,g,\ = i Tr(l - | >7). 

★Apart from probabilities of genetical types, it is interesting to compute 
the probabilities of different combinations of traits. Let Di and D2 stand 
for the dominant traits corresponding to the two pairs of genes and Ri 
and R2 for the recessive traits. Then there are four possible combinations, 
R1R2 , R1D2 , D1R2 , and D1D2 . Using the above probabilities of the ten 
genetical types in C we have 

P{C = R,R 2 } = |W(1 -- W), 

p{c = = i - jw(i - m, 

p[c = d,r2} = i - iTni - 

P{C = D,D 2 } = i + iW{l - W)- 

It is seen that the probabilities of the four combinations of traits depend 
in a simple way upon the expression representing the first of them. The 
reader is advised to substitute several values of W, say W = .1, .5, etc., 
and also W = 1.0, into the formulae to compare the resulting probabilities. 
As was mentioned, the probabilities with W = 1.0 correspond to the in- 
herHance of unlinked genes. 

★As a second example relating to the inheritance of two pairs of linked 
genes, we will consider the cross between a double recessive M : gigi , gzg^ 
and a hybrid F : gfix , ^26^2 • The student will notice that in this case 



114 


PROBABILITY IN GENETICS 


[ 3 - 3 - 1 ] 

there are only four genetical compositions which the child may possess, 
depending upon the four possible compositions of the paternal reproductive 
ceil. Since the genetical composition of the maternal reproductive cell is 
perfectly determined so that P{Z ; p, , j/jj = 1 , the probabilities of the 
genetical composition of the progeny are equal to the probabilities given 
the marginal row of Table 3 - 2 .* Hence 

P{C : gxgi , g^g^] = i(l - W) = P{C = M}, 

P{C : g,gr , g,G,] =W = P[C = R,D,], 

P{C : g,G, , g,g,\ = P{C = D,R,\, 

P{C : g,G, , g,G,] = Kl - W) = P[C = D,D,]. 

Crosses of this type are used to estimate empirically the probability W 
of at least one break of the chromosome pair between the loci of the genes 
considered. The estimate of W is provided by twice the sum of the relative 
frequencies of the types R1D2 and RtDt . 

A’As was emphasized earlier, genes are h}q>othetical entities invented to 
explain the frequencies of inheritance of this or that combination of traits. 
By estimating the probabilities W for several pairs of linked genes, it is 
possible to map them along the chromosome pairs. The machinery of 
inheritance of linked genes as presented here is modeled on the lines 
drawn by the remarkable discoveries of T. H. Morgan and his school [ 7 ]. 

3 * 3 . Study of successive generations 

3 * 3 * 1 . Introduction. Up to the present, we have been considering the 
simplest type of problems of heredity: given the genetical compositions of 
the parents, compute the probability of a specified genetical composition 
of the child. Thus, the elements of the fundamental probability set were 
the fertilized cells produced by combinations of reproductive cells of the 
two parents. Now we will consider several very interesting problems of a, 
so to speak, higher order, problems on the distribution of various genetical 
types among the individuals forming successive generations which repro¬ 
duce according to certain fixed rules. In this subsection we will explain 
the general pattern of these problems which form simplified models of 
certain processes of evolution. 

Each problem of successive generations will be studied in relation to a 
set r of several pairs of genes, frequently of a single pair of genes, say 
g, G. Whatever the number of genes involved, we shall distinguish all 
the possible combinations of these genes which may be carried by particular 

*The reader will notice that the marginal row of Table 3*2 is used rather than the 
marginal column because we consider F^hOi, rather than FigiOi, Otgt . 



[3*3*1] SUCCESSIVE GENERATIONS 115 

individuals. These combinations will be described as geneticaJ types and 
denoted by some appropriate letters, say Ti , T2 , • * • , J’m • In the case 
of just one pair of genes there will be only three types, gg, gO and OG. In 
the case of two pairs of genes, there will be nine or ten different types, 
according to whether the genes are linked or not. 

In each problem, we will consider a sequence of generations, each bom 
out of parental organisms belonging to the preceding generation. In some 
cases, an ^^originar' generation may be assumed, perhaps composed of 
individuals who suddenly settled on an uninhabited island, or representing 
the material with which a breeder begins his work. In other cases, there 
will be no postulated origin in the sequence of successive generations, and 
the particular generation at which the study begins will be assumed to 
have been born in the same circumstances as those applying to the fol¬ 
lowing generations. 

Whatever the case may be, the study begins by considering a generation 
Ho of individuals falling into m genetical types Ti y T2 , • • • , with 
proportions denoted perhaps by Poi , P02 • * • Pom or by soma other con¬ 
venient symbols. The generation Ho will be described as the original 
generation born and the proportions Po* will give the ‘‘distribution of 
genetical types^^ in Ho . In some problems it will be assumed that the 
distribution of genetical types among the males may be different from 
that among the females. In other cases, they will be assumed to be identical. 

The next element for our consideration is the group of individuals ti 
described as the first-generation mating. It is always composed of indi¬ 
viduals of the original generation born, but not necessarily of all these 
individuals. In fact, in some problems it will be specifically assumed that 
certain types of individuals belonging to Ho are prevented from entering 
the first generation mating. The distribution of genetical types in tti has 
its proportions denoted by lower-case letters, say Pu , P12 > • • • , Pim • 
In accordance with the nature of each problem, a relation is defined which 
permits us to compute the distribution of types in the first generation 
mating from that in the first generation born. This relation will be de¬ 
scribed as the selection rule. In particular, we will consider cases where 
the distribution of genetical types in tti is the same as that in IIo . This 
will be the case of no selection. 

Having established the distribution of genetical types in the first gene¬ 
ration mating, the conditions of each problem specify the system of 
mating in ti . These conditions are expressed in probabilistic terms, with 
the fundamental probability set composed of children. Thus, an element 
of this F.P.S. is a child C born out of a mating of two individuals from the 
population tti . Each child has a perfectly determined mother M and a 
perfectly determined father F. However, a male individual of the popula¬ 
tion wi may be the father of several children having the same or different 
mothers. Similarly each female member of the population may have 



116 


PROBABILITY IN GENETICS 


[ 3 - 3 - 1 ] 

several children all by the same or different fathers. The genetical types 
of M, F and C are treated as properties of the element of C of the F.P.S., and 
the conditions of each problem determine the probabilities that M and 
F have specified genetical types. In all cases these probabilities depend 
upon the distribution of genetical types in the population mating. 

The children born out. of the first population mating form the first 
population born Hi , and all the conditions enumerated above determine 
the distribution of genetical types in Hi denoted by Pn , P 12 , • • • , Pim • 
In all the problems considered, it is assumed that all the processes leading 
from Ho to Hi apply again to IIi and lead to the second population mating 
T 2 and later to the second generation born Ilg , etc. Generally, the letters 
TTn and Pni , Pn 2 y * * * , Pnm Stand for the nth generation mating and for 
the probabilities of genetical types in Tn . Hn stands for the nth population 
born out of the mating in Vn , and the probabilities Pni , P «2 > • * • > Pnm 
determine distribution of the m genetical types in !!« . The sequence of 
processes just described may be represented graphically as follows: 



In each step from one generation to another, the conditions of selection 
and the rules of mating are assumed to be the same. Each problem of the 
category described is concerned with formulae for the probabilities Pnk 
representing the distribution of genetical types in the nth generation born. 

The problem of distribution of types in successive generations born 
under a specific system of mating has attracted the attention of a number 
of leading mathematicians and biologists. The literature produced is quite 
extensive, and the references given at the end of this chapter are intended 
to introduce the reader to a few representative works where he can find 
further guidance. 

The first mathematician of great repute who worked on successive gene¬ 
rations seems to be G. H. Hardy [6]. Interesting treatments of some 
further problems are due to Felix Bernstein [1] and to the renowned 
Russian probabilist. Serge Bernstein [2]. In more recent times, an important 
paper was published by Hilda Geiringer [4]. 

The contributions of mathematicians reflect the natural tendency to¬ 
wards precision, but this gain in precision is paid for by a loss in realism. 
More realistic treatments of the problem of evolution are to be found in 




PANMIXIA 


[3-3-2] 


117 


the writings of biologists, of whom we shall mention T. Dobzhansky, [3], 
J. B. S. Haldane, [5] and Sewall Wright [9]. 


3*3*2. Panmixia. The word panmixia seems to have been introduced by 
two English scholars, Sir Francis Galton (1822-1911) and Karl Pearson 
(1857-1936), to denote the kind of mating in which mates are selected 
independently of their hereditary characteristics. A specific definition of 
this concept in terms of genetical compositions is given below. 

Consider a set T of pairs of genes and all the possible combinations of 
these genes that an individual can carry, say, 

(3*3*1) 7\, n, ••• , 

Let TT be the population mating and let 

Pi' , P2 , • • • , Pm 

represent the proportions of females in t having the different genetical 
types (3*3*1), respectively. Thus p' stands for the proportion of females 
in TT who have the genetical type T, , for i = 1, 2, • • • , m. Further, let 

Pi", P 2 ", • • * , Pm 

be the proportions of males in ir having the different genetical types (3*3*1). 
Now consider the set S{7r) of children born from individuals of the popula¬ 
tion TT. Denote hy P{M : Ti \ C\ the proportion of those children whose 
mothers have the genetical type T*- for i = 1,2, • * • , m. Thus P[M :Ti | C] 
is the absolute probability that the child^s mother is of genetical type Ti . 
Similarly P[F : Ti \ C} is the absolute probability that the child’s father 
is of genetical type Ti . Finally, denote by P[M : Ti | C(F : T,)} the 
relative probability that the child’s mother has genetical type Ti , given 
that the father of this child has genetical type T, , for i, ^* = 1, 2, • * • , m. 
The definition of P{F : T* | C(M : T,)} is similar. 

Definition 3 * 1. We shall say that the system of mating in the population 
TT is panmixia with respect to the set of pairs of genes T if the following two 
conditions are satisfied: 

(i) Whatever i = 1, 2, • • • , m, 

P[M : Ti I C} = pj and P{F : Ti | C] = p'/ ; 

(ii) whatever j = 1, 2, • • • , m such that p,' > 0, and whatever 
i = 1, 2, • • • , m, 

P{F : Ti 1 C{M : T,)} = P[F : T, j C} = pV . 

If the system of mating is not panmixia^ it is assoriaiive* 



118 PANMIXIA [3‘3*2] 

If conditions (i) and (ii) are satisfied, then the application of the theorem 
on independence gives 

P{M : T, I C{F : T^)} = P{M : T, | C} = pj 

for every i = 1, 2, • • • , m such that p'/ > 0 and for every j = 1, 2, • • •, m. 

Condition (i) requires that the proportion of children whose mothers 
have a specified genetical type Tt equal the proportion of females in v 
having the same genetical type T*, for f = 1, 2, • • • , m. Also, there is a 
similar requirement regarding fathers. 

Condition (ii) requires that the genetical composition of one parent be 
completely independent of the genetical composition of the other parent. 

The reader’s attention is called to the fact that the concept of panmixia 
is defined in relation to a set r of pairs of genes. Thus a given system of 
mating may be panmixia with respect to some genes and assortative with 
respect to some others. 

To illustrate the definition of panmixia consider the simplest case where 
the set r is composed of only one pair of genes, g and G. There are three 
combinations of these genes that an individual can carry, gg, gG, and GG. 
In the first two examples we shall consider a population mating composed 
of two females. Mi and Ma , and of three males, Fi , Fa and Fa , with 
compositions 

Ml : gg, Ma : GG, 


Fi : gG, Fa : GG, Fa : GG • 

The distributions of the three genetical types among the females M 
and the males F in ici are 

p{ = P{M : gg} = h, pi = P{M : gG] = 0, p^ = P[M : GG\ = h 
(3-3-2) 

Pi' = P{F : gg] = 0, p'a' = PIF :gG}=h p'a' = P{F :(?(?}=§, 

respectively. 

Now we shall consider two different systems of mating in the population 
wi , one system leading to a set of children Si(ir,) and the other to a set 
Sa(iri). The set iSi(ir,) is defined to contain six children Ci through C# 
bom out of the matings 

Ml XFi = Ci, MiXFa^Ci, MiXFa = C'ay 

MaXFi== a, Ma X Fa = a, Ma X F, = Ci . 

Thus, Ml and Ma have three children each. Mi by three different fathers 
aqd Ma by two different fathers. The male Fi has two children, Ft has 
three children and Fa only one child. With respect to the set of children 



[3-3-2] ONE PAIR OF GENES 119 

*Si(7ri) considered as the F.P.S., the probabilities of the various genetical 
compositions of mothers and fathers are 

P{M : gg \ C\ = P{F : gg \C} 0, 

P\M : gG \C} =0, P{F : gG \ C} = ^ ^ , 

P{il/ : GG 1 C} = I = I, P{F ; GG I G} = I = I . 

It is seen that these six absolute probabilities coincide with the pro¬ 
portions (3*3-2) characterizing the distribution of genetical types among 
the females and among the males of population tti . It follows that the 
process of mating leading to the set of children satisfies condition 

(i). We now compute the relative probabilities of specified genetical types 
of fathers given the type of the mother. Since P{M : gG \ C\ =0, the 
rc'lative probabilities will have to be computed on only two assumptions, 
M : gg and M : GG. We have 

P{F : gg | C(M : ^7^)1 = 0, P{F : gg | C(M : GG)] = 0, 

P{F : gG | C(ilf : gg)} = i P{F : gG | C{M : GG)} = i 

P{F : GG I C(M : gg)} = f, P{F : GG | C{M : GG)} = f, 

and it is seen that the genetical compositions of the father arc completely 
independent of those of the mother. It follows that the system of mating 
which leads to the set Siiir^) satisfies both conditions (i) and (ii) and, 
therefore, is panmixia. 

We now consider a second system of mating within population tti . 
Under this system the set of children ^' 2 (^ 1 ) is composed of four children 
only. Cl" through C". They are born from the following matings; 

M, X Pi = Cr , Ml X P 2 = C5' , 

M 2 X Pi = C 3 " , M 2 X P 3 = cu. 

The reader is invited to verify that this system of mating is not panmixia 
because it does not satisfy condition (i). The reason is that among the 
children of the set /S 2 (ti) there are too few with pure dominant fathers. 
However, in this system of mating, the genetical types of one mate are 
completely independent of the genetical types of the other mate, so that 
condition (ii) is satisfied. 

In the above two examples the generation of children were produced 
from rather special combinations of parents. In order to illustrate a situa¬ 
tion that is more typical for huftian beings, we shall consider a population 



PANMIXIA 


120 


[ 3 - 3 - 2 ] 


mating ir 2 composed of 1000 women and 1000 men with the following 
numbers of each genetical type. 



gg 

gG 

GG 

No. of Women 

600 

400 

100 

No. of men 

200 

300 

600 


Moreover, we-shall assume a two-way monogomy and one child per 
family. This means that the 2000 individuals of population TTa form 1000 
families, each woman having one husband and each man having one wife, 
and that each family has one child. 

This assumption guarantees that condition (i) in the definition of pan¬ 
mixia is automatically satisfied. Therefore the question whether or not a 
given system of mating is panmixia depends on condition (ii). 

We shall consider two different systems of mating. Si and S 2 in the 
population TTa. The system of mating St is characterized in Table 3‘3. 


Table 3-3 

Distribution of 1000 Families According to Combinations of Genetical Types 
of Mother and Father Mating under System Si 



M : gg 

M : gG 

M : GO 

Totals 

Absolute 

Prob- 

abilities 

f ’ gg 

100 

80 

20 



F: gG 

160 

120 

30 


m 

F : GG 

250 

200 

50 


^3 ~ *5 

Totals 

500 

400 

100 

1000 


Absolute 



■ 



Probabilities 

m 


Ps = .1 




The body of the table contains nine cells, each corresponding to a 
specified combination of genetical types of mother and father. Thus, for 
example, there are 50 families with both parents dominants GG. Also, there 
are 120 families with both parents hybrids gG, etc. The column of totals 
giv^s the numbers of families with fathers of each of the three genetical 

































ONE PAIR OF GENES 


121 


[ 3 - 3 - 2 ] 

types. Similarly, the row of totals gives the distribution of families accord¬ 
ing to the genetical type of the mother. The marginal column gives the 
absolute probabilities that the father has a specified genetical type. These 
probabilities are obtained by dividing the number of families in the preced¬ 
ing column by the total 1000. The bottom row gives the absolute probabili¬ 
ties of each of the three genetical types for mothers. 

In order to verify whether or not the mating under system Si is panmixia, 
it is necessary to verify whether or not each of the genetical compositions 
of one mate is independent from each composition of the other. For this 
purpose, we need the relative probabilities that one mate has a specified 
genetical composition, given the genetical composition of the other mate. 
These relative probabilities are obtained by dividing the number of 
families in a particular cell by the total in the marginal row or in the 
marginal column. Thus, for example, 

P[F : gg\M : gg] = — = .2 = P{F : gg\ = pi' , 

P{F : gg \ M : gG] = ^ = .2 = pi ', 

P{F :gg\M : GG} = ^- = .2 == pi', 

etc. Proceeding in this way the reader will easily check that, whatever the 

specified genetical types il/' and F' of mother and father, respectively, the 
relative probability of F' given M' is equal to the absolute probability of F', 

P{F :F'\M :M'} = P{F:F'], 

and vice versa. It follows that the system of mating Si is panmixia with 
respect to the genes g, G. 

Consider now another system S 2 of mating in the same population T 2 
with the distribution of genetical types of the mates as sc^t up in Table 3*4. 

Although the absolute probabilities (also occasionally called “marginal'^ 
probabilities) of the three genetical types in each of the mates taken 
separately are exactly the same under System aS >2 as under System aS*i , 
the relative probabilities of particular genetical compositions of one mate 
given that of the other are different. For example, 

P[F:gg\M:gg] =11= .37 > pi' = .2, 

(3:3-3) P[F:gg\M:gG] = ^ = -025 < pi ', 


P{F : gg\M :GG] .05 < pi', 



122 


PANMIXIA 
Table 3-4 


[3-3-2] 


Distribution of 1000 Families According to Combinations of Genetical Types 
of Mother and Father Mating under System S 2 



M : gg 

M : gG 

M : GG 

Totals 

Absolute 

Prob¬ 

abilities 

P : gg 

185 

10 

5 

200 

n 

F:gG 

170 

120 

10 

300 


F : GG 

145 

270 

85 

500 

Q 3 — *5 

Totals 

500 

400 

100 

1000 


Absolute 

Probabilities 

Pi = -5 

Pa = .4 

Ps = .1 




etc. It follows that the system S 2 of mating is assortative rather than 
panmixia. The situation illustrated by formulae (3‘3-3) is described by 
saying that *^recessive males favor recessive females/' 

Studies by Sir Francis Galton and by Karl Pearson indicate that mating 
in human populations is assortative with respect to a number of traits. 
However, in many cases, the deviation from panmixia is very slight, and 
the relative probability of a genetical composition of one of the mates, 
given the composition of the other, differs but little from the absolute 
probability. With plants and some other organisms the assumption of 
panmixia seems perfectly justified. 


★ PROBLEMS AND EXERCISES 


In a population ir mating under panmixia with respect to two pairs of 
genes g, G and /i, Hj the distributions of mothers and fathers according to 
their genetical types is as follows: 


P\M : gg^hh] = -1 
P[M : gg,hH] - .2 
P[M ; gG,hH\ = .3 
P\M : GGyHH] - .4 


P{F : gg,hh] = .2 
P[F : gg,hH] - .3 
P{F : gG.HH] = .1 
P[F : GG,hII] = .3 
P\F : GG,HH] = .1 




















ONE PAIR OF GENES 


123 


[ 3 - 3 - 3 ] 

1 . Determine all the different combinations of genetical types of mates 
in the population tt. If there are 1000 two-way monogamous families in 
TT, how many have each specified combination of genetical types of mother 
and father? Construct a table similar to those describing the systems of 
mating in the population ttj above. 

2. In relation to the population tt described above, compute the prob¬ 
abilities P{M : gg] and P{M : gg \ F : gg],lBM \ gg independent of F : gg*{ 

3*3*3. Successive generations under panmixia with no selection. Case of 
one pair of genes. Consider a single pair of genes g^ G and a sequence 
of successive generations 

Ho ; TTi , Hi ; TTa , 112 ; • • • ; TTn , , 

as described in subsection 3-3-1. Since there are only three different 
genetical typos to be distinguished, we shall simplify the notation by 
using the capital letters and {ri = 1 , 2, • • •) to denote the prob¬ 

ability that an individual of the ?ith generation born !!„ will be a dominant, 
a hybrid, and a recessive, respectively. It will be assumed that the genes 
( 7 , G are not ‘^sex linked,that is to say, that the probabilities of the three 
genetical types in each generation born are the same for males and for 
females. A further assumption will be that each generation mating 7 r„ , 
beginning with n = 1 , is obtained from the preceding generation born 
without selection. In otluT words, it is assumed that the distribution of 
types in 7 r„ is the same as in Iln-i . It will be assumed that the first gene¬ 
ration mating tt, is composed of arbitrarily chosen individuals, forming 
the original population ITo , with the distribution of types in females 
possibly different from that in males. The corresponding probabilities will 
be denoted as follows: 

Types: GG gO gg 


(3-3-4) 


Probabilities 
in TTi 





for females 
for males. 


Our final assumption is that the mating in all generations tti , 7 r 2 , • • • 
is panmixia with respect to the genes g and G, Under these assumptions 
we will compute the probabilities Pn,Qn, and P„ . We begin by computing 
Pi , Qi , and Ri . To explain the method by which each of these prob¬ 
abilities is obtained, let M, P, and C stand for mother, father, and child 
in a family with parents from tti , and let C' denote any specified genetical 
composition of C. Then C : C', a property of the child, can be represented 
as the logical sum of nine mutually exclusive properties, so that its prob¬ 
ability 



124 


PANMIXIA 


[ 3 - 3 - 3 ] 


P[C : C'} = P{(M : gg){F : gg){C : C) 

+ {M : : «7G5(C : C) + (M : gg){F : : C") 

+ {M .: : gg){C : C') + (M : gG){F : j7(?)(C' : C') 

+ (M : : GG){C : C') + {M : (?G)(P : pff)((7 : C') 

+ (M : GG)(F : ffGXC : C") + (M ; GGO(F : GG){C : C’)]. 

We now apply the addition theorem. Since the nine properties are 
mutually exclusive, P{C : C] is the sum of nine probabilities of the 
type P {(M : M') (F : F') (C : C')\, where M' and F' stand for some specified 
combinations of genes g, G. We now apply the multiplication theorem to 
each of the nine probabilities, 

P{{M : M')(F : F')(C : C')] 

= P{M : M'\ P[F : F'\M :M'\ P{C : M'){F : F’)}. 

This general formula simplifies because of the hypothesis that the system 
of mating in the population ti is panmixia, namely, whatever M' and F’, 

P{F:r\M :M'\ = P{P:P'}. 

Therefore, 

P{(M : M'){F ; F'){C : C')] 

= P[M : M'\ P\F : F’} P{C :C'\{M x M’){F : P')}- 

The first probability on the right-hand side is given in the statement of 
the problem and is either p', q' or /. Similarly, the value of P{P : F'} is 
either p", or r". The last probability P[C :C' \ (M : M'){F : P') K that 
the child will inherit C from parents whose genetical composition is 
known, is of a familiar type. 

The application of this method gives 

Pj = p'p" ' -1- pV X 0 

+ q'v"\ + + q'r” x o 

+ r'p" X 0 + r's" X 0 -1- rV" X 0, 
or 


( 3 - 3 - 6 ) 

In a' similar fashion 


Pi = (p' + k')(p" + k"). 



[ 3 - 3 - 3 ] 


ONE PAIR OF GENES 


125 


Qx = (p' + + k'O + ir' + JgO(p" + W'), 

(3-3-6) j 

Ui = (r' + W)(r'' + k"). 

These formulae simplify if the distribution of gcnetical types in Vi among 
fathers is the same as that among mothers, so that 

P' = p” = p (say), 

■ 3' = q" = q, 

y = r" = r. 

>. = (P + k)^ 

- Q. = 2(p + k)(r + ig), 

ffx = (r + k)^ 

Since p + g + r = (p + k) + (^ + k) = 1, it is seen that, whenever 
the distribution of the three genetical types among fathers is the same as 
that among mothers, the probabilities of the genetical types of children 
born out of panmixia are related by the following simple equations 

Px = (i - viiiy, 

(3-3-9) _ _ 

Q, = 2(1 - VPi) a/Pi . 

Formulae (3-3-9) are very useful, and we shall apply them on several 
occasions. The reader’s attention is called to the fact that for the validity 
of these formulae it is sufficient (i) that the distribution of genetical types 
among males of the population mating be the same as that among females, 
and (ii) that the system of mating be panmixia. 

Formulae (3-3-5) and (3-3-6) give a complete solution of the problem 
of the first generation born out of panmixia. 

In order to compute Pn ,Qn, and , we could apply the same method 
which gave the expressions for Pi, Qi, and Ri . However, the same results 
can be obtained more easily by noticing that, because of the lack of 
selection, the probabilities P„ , Qn , and must be connected with Pn-t , 
Qn-i , and Rn-i by the same relations (3*3-8) which connect Pi , Qi , and 
Pi with p, g, and r. Thus, for example, 

fp, = (Pi + fQ,)^ 


(3-3-7) 

In this case 

(3-3-8) 


(3-3-10) 


Q, = 2(P, + hQ,)(Ri + IQi), 
,R2 = («i + m". 



126 

and 


PANMIXIA 


[3*3-3] 


Pz = {P2 + W, 

(3-3-11) ^ Qa = 2(P, + + iQ,), 

= (R 2 " 1 " i 
etc. 

Substituting the expressions (3-3-10) of P 2 , Q 2 , and R 2 into (3*3-11), 
we obtain a very interesting and important result 

1^3 = [(Pi + Wi)" + (Pi + iQi)(Pi + iQdf 
= (Pi + mPi + iOi + Pi + hQiY 
= (Pi + iQO" = P 2 , 

and similarly 

Qa ~ Q2 and P3 = P2 • 

Since the fourth generation born out of panmixia is connected with the 
third just as the third is connected with the second, etc., it follows that, 
whatever be the original population Ho , 

P2=P3=P4= ••• = Pn> 

Q 2 ^ Qz — Qa ^ ~ On > 

R 2 == Rz — R 4 = • • • ^ Rn • 

In other words, under repeated mating according to panmixia^ with no 
selection between successive generations born^ the distribution of the three 
genetical types determined by one pair of genes not linked with sex is the same 
in the successive generations born as in the second generation born. The dis¬ 
tribution P2 , O2 > and Pa in the second generation depends on whether 
or not, in the first population mating, the distribution of the genetical 
types among the fathers is the same as that among the mothers. In the 
general case, 

p. - (p. + 5 q )' - [(p' + 5 «')(p" + 1 »") 

+§(”' + §«')(’"'+i«") + i (’"+i«')(”"+\«")]’ 




[3'3-3] 

and similarly, 


ONE PAIR OP GENES 


127 



R2 


+ 2 


y _ 'r' -\-r 


ft 


. Isl+^T 

■^2 2 J • 


It is seen that, if p' = p", c{ = g" and / = r", then P^. == Pi , Q2 = Qi , 
and P2 = Pi • Otherwise, however, the distribution of genetical types in 
the second generation born need not be the same as in the first. 

Returning to the system of mating described in Table 3-3 of the pre¬ 
ceding subsection, and using the notation of the present subsection, we 
have 


p' = .1, g' = .4, r' =: .5, 

p" = .5, g" = .3, r" = .2. 

Using formulae (3-3-5) and (3-3-6), it is found that the probabilities of 
the three genetical types in the first generation born are 

Pi = (.3)(.65) = .195, 

Qx = (.3)(.35) + (.7)(.65) = .560, 

Pi = (.7)(.35) = .245. 

The distribution of genetical types becomes stabilized in the second gene¬ 
ration born, with 

p, = ... = P, = (.195 + .280)" = .226, 

= ... = Q, = 2(,195 + ,280)(.245 + .280) = .499, 

P2 = • • • = Pn = (.245 -f .280)" == .276. 

It is interesting to note the analogy between the problem of the first 
population born under panmixia and that of the problem of bag and boxes 
solved in Section 2*7. The first population mating tti corresponds to the 
bag. Since there are nine different combinations of genetical types of 
mates, the balls in the bag must bear numbers one to nine in order to 
have a complete analogy. Also, there must be nine boxes, each box corre¬ 
sponding to a determined combination of genetical types of mother and 
father. The balls filling the boxes correspond to the possible genetical types 



128 MASS SELECTION [3*3-4] 

of the child. In the bag and boxes problem, the balls in the boxes were 
either black or white. In the case of panmixia, some of the ^^boxes^' must 
include just one type of balls. This will happen when both parents are 
homozygous. In other ‘‘boxes^' there will be two different types of '^balls'' 
and in still others, three different types. With these slight changes, the 
analogy between the two problems is complete. 

3•3*4. Successive generations under panmixia and mass selection against 
recessives. 

Definition 3*2. The selection of the nth generation mating 7r„ out of the 
preceding generation horn Iln-i , which consists in including in 7r„ all the 
dominants and all the hybrids present in Iln-i but none of the recessives^ is 
called mass selection against recessives. 

The conditions in this definition may be weakened somewhat. Instead 
of requiring that Xn include all the dominants and hybrids present in 
Iln-i , it is sufficient to require that the same proportion (e.g., one-half) 
of each type present in Iln-i be included in Wn • 

In the present subsection we will treat the problem of successive genera¬ 
tions 

Ho ; TTi, Hi ; iTa, Ha ; • • • ; Xn ,!!«;••• 


under the following assumptions: 

(i) The original generation Ho is born out of panmixia with respect to 
a single pair of genes g, G which are not linked with sex. Accordingly, by 
using (3*3-9), the distribution of the three genetical types in Ho is de¬ 
termined by the probabilities 

p{GO I n„} = Po = (1 - VRoY, 

PlGg I Ho} =Qo = 2(1 - VRo)VRo, 

P{gg I Ho} = Ro • 

(ii) Each generation mating ir* ; A; = 1, 2, • • • , n, • • • , is obtained 
from the preceding generation bom by mass flection against recessives. 

(iii) Each generation , A: = 1, 2, • • • , n, • • • , mates under panmixia. 

Conditions (ii) and (iii) amount to stating that, out of each generation 

bom lit, ail the recessives are removed and the remaining dominants and 
hybrids mate according to panmixia. 

As in the preceding subsection, the objects of our study are the formulae 
for the probabilities F. , Q» , and that a member of the nth generation 
bom will be a dominant, hybrid or recessive, respectively. Lower-case 
letters Pn , g. , together with r» = 0 will represent the same probabilities 
in the'nth generation mating. 



AGAINST RECESSIVES 


129 


[3*3-4] 


We begin by noticing that, because of the absence of linkage with sex 
and because of panmixia in each generation mating, formulae (3"3-9) 
may be applied to P„ and Q, , 


(3-3-12) 


P„ = (1 - y/R„y, 
Q„ = 2(1 - 


forn = 1, 2, • • • . Therefore, to solve the problem completely, it is sufficient 
to compute just one probability for each !!„ , namely , for n = 1, 2, 
• • • . Since each generation n„ is born out of panmixia in the generation 
mating 7r„ , the probability Rn is connected with and r„ which is 

zero by the last of formulae (3*3*8). Thus, for each n, 


Rn=- 


(k«)^ 


But the generation Wn is obtained from n„„i by removing all the recessives 
and including all the dominants and hybrids present in n„_i . Therefore, 
Qn is, in reality, the relative probability that a member of Iln-i is a hybrid, 
given that it is not a recessive. Consequently, the probability is obtained 
from Pn-i , On-1 , and Pn-i by a simple application of Theorem 2*5, the 
theorem on relative probability in subsection 2*6*5. Thus 


or, since Q„-i = 2(l - \^Rn-i) \/Rn-i , 


(3*3*13) 


1 + \/Rn-\ 


Substituting this into the formula connecting R^ with qn , we obtain the 
relation between the proportions of recessives in any two consecutive 
generations born. 


Rn = 



Pn-1_ 

(l + ^Rn-lY 


In particular, by successive substitutions, 


P, = 


Rq 


(1 + V«o) 


2 » 


R 2 


Pi _ 

(1 + VRiT 


_Po_ 

(1 + 2\/Po)^ ^ 


P 3 


Pi __ Pq _^ 

{i + 2V^iT (i + sV^oY' 



130 MASS SELECTION 

These three formulae suggest the general law 


[3-3-4] 


(3-3-14) 


Rn 


_fto_ 

(l + ny/R^y 


In order to establish the general validity of (3-3*14) for any n we use 
the method of mathematical induction. Since the validity is already 
established for n = 1, 2, 3 there remains for us to prove, for every A; ^ 1 
that, should formula (3•3* 14) be true when n has the value k, then it 
must be true when n = A; + 1. 

Assume then that 


Rk — / 


Rq 


{l + ky/R,Y 

If SO, then the reasoning used forn = 1 gives 

Rk _ Rq 


Rk +1 — 


(1 + VRkY (! + (* + 1 ) VRoY ’ 


which is again the same formula as (3*3*14) with n = k + 1. Thus, 
formula (3 •3* 14) is valid for any n = 1, 2, • • • . Using (3-3*12), we obtain 


P - / I + (n ^ 1) VRo V 

" V l+nVRo ’’ 


Q 2(1 + (n - 1) Vfio) Vfto 

(1+nV^)^ 

Inspection of formula (3 • 3 • 14) shows that, as the number of successive 
applications of mass selection is increased, the proportion of recessives 
in the generations born becomes smaller and smaller. In fact, whatever 
positive number a, however small, it is always possible to find the genera¬ 
tion, say njv(a) , such that, beginning with this generation and in all that 
follow, the probability of a recessive will be smaller than a. To determine 
the number iV(a) of this generation, it is sufficient to solve the inequality 

p _ _^ 

“ (l+n 

Easy algebra gives 


(3-3-16) 



1 

^/Ro 


Thus, for E, to be less than a it is sufficient that n be greater than the 
expression on the right-hand side of (3*3-15). It follows that iV(a) is the 
smallest integer exceeding (l/\/a — 

Mass selection against recessives is frequently applied artificially in 



AGAINST RECESSIVES 


131 


[3-3-4] 

plant breeding. Thus, for example, if it is desired to produce pea seeds 
with predominantly red flowers, it is sufficient to weed out before pollina¬ 
tion all the plants which promise to have white flowers. On the other hand, 
with respect to certain genes, mass selection against recessives is a natural 
process and goes on for centuries on its own. The recessive trait corre¬ 
sponding to such genes makes it impossible for the organism to live or, at 
least, to reproduce. An example of such traits in man is said to be complete 
idiocy. 

In this connection, it may be asked how it happens that, in spite of 
mass selection against recessives for innumerable generations, cases of 
recessive idiocy are still observed from time to time. This problem is 
somewhat complex, and a complete explanation seems to depend on so- 
called ^^mutation’’ of genes, which cannot be discussed here. However, a 
partial answer to the question is provided by a study of the speed of de¬ 
crease in the value of . If Rq is large, say Ro = .49, then a single appli¬ 
cation of mass selection reduces the proportion of recessives very con¬ 
siderably: 

(1 + .7)" 2.89 ^ 

On the other hand, if the proportion of recessives is very small, perhaps 
because of millenniums of mass selection, then one additional application 
of mass selection does not have any practical effect. Assume, for example, 
that Ro = .000,001. Then 

= .000,000,998, 

which is practically equal to Ro 

PROBLEMS AND EXERCISES 

In all the exercises given below it is assumed that none of the genes 
are linked with sex. 

1. In a pqpulation tt the distribution of genetical types is as follows: 

Types GG gG gg 

Distribution of Females .1 .5 .4 

Distribution of Males .6 .3 .1. 

Compute the distribution of genetical types in the two successive gene¬ 
rations which follow TT under panmixia and without selection. 

An^er: P, = 0.2625, Q, = 0.5750, R, = 0.1625, 


Pa = 0.3025, Oa = 0.4950, 


Pa = 0.2025. 



132 


MASS SELECTION 


[ 3 * 3 - 5 ] 

2. In a population ir the distribution of genetical types GG, gG and gg 
is p, q, and r, the same among males and females. In the generation n 
bom out of panmixia in v, the proportion of dominants is four times as 
great as the proportion of recessives. Determine the distribution of ge¬ 
netical types in H, say P, Q and R. 

Answer: = ® = ^"^1' 

3. Consider the population ir and the first generation born n in Problem 
2. However, instead of assuming that P = 4/2, assume that the proportions 
r and R of recessives are known. Determine the probabilities p, g, P and Q. 
Assume, for example, R = .36, r = .50. 

Partial answer: P = (l — = 0.16. 

4. In a population of individuals /, the distribution of genetical types 
G(jr, gG, and gg is given by p, g, r. It is known that this distribution is 
stable with respect to the system of reproduction, consisting of panmixia 
and no selection. Determine p, q and r, knowing that 

P{/ : G(? 1 J : = i 

Answer: P = ^""1’ 

6. In a population tt the distribution of three genetical types with respect 
to genes g and G is the same among males and females. The proportion 
of recessives in the population n born out of panmixia in tt is equal to 
.81. How many times must the process of mass selection be applied to 
reduce the proportion of recessives to something less than one percent? 
Compute the distribution of types in the first generation born satisfying 
this condition. Partial answer: 9; P = 0.2770. 

6. A system S(a) of reproduction of successive generations consists of 
(a) removing a proportion a of all recessives gg born in n„ for each n, 
but including all the remaining individuals of n„ in the next population 
mating 7r„+i ; (b) letting individuals in TTn+i multiply under panmixia. 
Let the distribution of types GG, gG, and gg in JIq j both males and females, 
be represented by p = (l — q = 2(l — \/r)\/r, and r, respectively. 
Deduce formulae giving the distribution of genetical types in the second 
generation bom. Substitute a = 1 and verify whether the results coincide 
with the formulae relating to mass selection against recessives. 

7. In successive generations born under panmixia and without selection, 

the proportion of pure recessives is JB = .64. It is presumed that the 
original population mating was composed of pure dominants and recessives. 
Determine the proportions p and r of these types in the original popula¬ 
tion, * Answer: p = 0.2. 



BROTHER-SISTER MATING 


133 


[ 3 - 3 - 5 ] 

3*3-5. Brother-sister mating. The problem considered in this subsection 
throws a side-light on the origin of morality and its rules. These rules 
appear as rules of inductive behavior established in relation to vaguely 
established permanencies. 

In all civilized societies it is considered immoral to marry one's sister 
or daughter. Why should this be more immoral than to marry the sister 
or daughter of one's neighbor? Marriages between cousins are ^^less im¬ 
moral" than marriages between sibs, but still not appropriate. These moral 
views are fairly old. However, in Biblical times attitudes were different, 
and marriages between sibs and between parent and child were respectable. 

The reason for the change in attitude towards marriages between rela¬ 
tives is ascribable to the fact that such marriages lead to an increased 
frequency of unsatisfactory progeny. Undoubtedly, this increase in fre¬ 
quency was noted and recorded as a permanency quite early in our history. 
Since the reasons were not immediately apparent, it was interpreted as 
punishment for sinful marriages. The discovery of the Mendelian laws 
gives a rational explanation to the interesting phenomenon. Out of many 
related problems, we shall treat in detail only the problem of brother-sister 
mating. Some other problems are given as Exercises at the end of this 
subsection. 

Let ( 7 , G be a pair of genes unlinked to sex such that the corresponding 
recessive trait is detrimental. Assume that successive generations are born 
under mass selection against the recessives and under panmixia among 
the dominants and hybrids, which are indistinguishable. Specifically, 
assume all the hypotheses and also the notation of the preceding subsection. 
Moreover, assume that, given the parents, the genetical composition of 
one child is independent of that of another. 

Consider families composed of three generations. In each family the 
mother M and father F are members of tti , the first generation mating 
under panmixia. The probabilities that either is a hybrid or a dominant 
are, using (3*3-13), 


2 V7?o 
1 + V^o 


and 


Pi = 1 - 


?i = 


1 - \/g« 

1 + Vlio ’ 


where Ro stands for the proportion of recessives in Ho , the original gene¬ 
ration born. M and F have a daughter D and a son S, neither of which is 
a recessive. D and S marry and produce an offspring Z. Our problem 
consists in computing Rz , the probability that Z is a recessive. 

To compute Rz we will use the method applied to compute the prob¬ 
abilities of the three genetical types in the first generation bom out of 
panmixia. For this purpose, notice that for the offspring Z to be a re¬ 
cessive gg, given that its parents are not recessives, both of its parents must 
be hybrids so that D : gG and S : gG. Since it is given that neither M 



134 PROBABILITY OF A RECESSIVE [3'3*5] 

nor F is a recessive, the only combinations of the genetical compositions 
of M and F which can produce hybrid offspring are 

(M : GG){F : gG), (M : gG)(J? :GG) and (M : gG)(JF : gG). 

It follows that 

Rz = P{Z : gg] = P{(M : GG)iF : gG)(D : gG)iS : gG)(Z : gg) 

(3-3-16) + (M : gG)(F : GG){D : gG){S : gG){Z : gg) 

+ (M : gG){F : gG){D : gG)(,S : gG){.Z : gg) 

\M, F, D, S : ^}, 

where the symbol M, F, D, S stands for the assertion that M, F, 
D and S are all not recessive. 

Applying the addition theorem, the computation of Rz is reduced to 
computing the probabilities of the three logical products on the right-hand 
side of the last formula. We will compute them separately, beginning with 
the first. Using the multiplication theorem and the assumption that, given 
the genetical types of parents, the genetical type of one child is independent 
of that of the other, we have 

P{(M : GG)iF : gG)(D : gG)(S : gG)(.Z : gg) \M,F,D,S: m 

= P{M : GG\M P{P ^ gG\F :W] 

. P{D : gG\iM: GG)(F : gG)] P{S : gG \ (M : GG)(F : gG)] 

. P{Z :gg\(D: gG)(S : gG)} 

1 1 1 _ 1 - VRo 2v<Ro J_ _ 1 VRc - Ro 

• 2 • 2 • 4 1 + 1 + ■ 16 8 (1+ 

Because of the symmetry in M and F, the probability of the second logical, 
product on the right-hand side of (3-3-16) has the same value as that of 
the first. The computation of the probability of the third logical product 
is a little more complicated because the cross of (M : gG) X {F : gG) may 
produce recessive children D and S, and, therefore, in computing the 
probabilities relating to D and >S, the assumption that neither of them is 
a recessive must be taken into account explicitly. In so doing, it is necessary 
to use the formulae of subsection 3•2-3, namely, 

P{D:gQ\{M: gG){F : gG){D 

and a similar formula relating to S. Thus 

P{(M : gG){F : gG){D : gG){S : gG)(.Z : gg) \M,F,D,S: 

= 1 ^ / 2y/R„ \(2\ 1 _ 4ji!, 

‘V3/ 4 \i + Vs/ 4 9(1 + ’ 



BROTHER-SISTER MATING 


135 


[3-3-5] 


Finally, 

^ VK-Rq 4 Bo ^ Bo / 1 7\ 

" 4(1 + VRoY 9 (1 + VBo)^ 4(1 + VRoY WB; 9/ 

To bring out the increase in frequency of recessives among the progeny 
of sibs, we will divide Rz hy the probability of a recessive among all the 
members of the generation to which Z belongs. If Ho is the generation in 
which M and F are born, then Z belongs to Ha , the second generation 
born, with the probability of a recessive 


R2 = 

Thus, the interesting ratio is 


(1 + 2 vroY ' 


(3-3-17) 


= / I + 2 VBo Y 1 /_J_ 7\ 

Bo \ 1 + VBo ' 4 \ VBo 9/ 


It is seen that, if the preceding generations of mass selection have re¬ 
duced the proportion Rq of recessives to a rather low level, then the first 
factor in (3-3*17) is slightly greater than unity while the other is greater 

than —^ 7 ===. If 7?o is very small, then —^ 7 == is large and /?;? is considerably 
4V^o 4\/Ro 

larger than R 2 . For example, if Ro == .000,001, then Rz is roughly 250 
times larger than R 2 . 


PROBLEMS AND EXERCISES 


1. Father-daughter mating. With the notation and assumptions of sub¬ 
section 3-3-5, let U stand for the offspring of a father-daughter mating 
D X F = U. Compute the probability P{U : gg}* 


Answer: 


Ro _ / 

(1 + VRoY U VBo 12/* 


2 . Cousin mating. Consider a sequence of generations with continued 
mass selection against recessives gg and with panmixia among the domi¬ 
nants and hybrids, as in subsection 3*3-4. Isolate a four-generation family 
as follows: 

M X F Generation ti 



Xx X D S X X 2 Generation V 2 



Cl X C 2 Generation ttj 

i 

Y 


Generation IT; 



136 


PROBABILITY OF A RECESSIVE 


[3-3*5] 

Individuals M and F, the great-grandparents of F, belong to the first 
generation xi mating according to panmixia. The daughter Z> and son S 
of M and F select mates Xx and X 2 out of X 2 , also according to panmixia. 
Individuals Ci and Cj are first cousins and belong to xg . They marry and 
produce the offspring Y. None of the individuals ilf, F, Z), S, Xi , Xa , 
Cl or Ca is a recessive. Compute the probability Ry that Y will be a re¬ 
cessive and compare it with R^ . 

3. Uncle-niece mating. Consider the conditions of the preceding problem 
and denote by V the progeny of a marriage between the uncle S and niece 
Cl f S X Cl = V. Obtain the probability Ry that V will be a recessive. 
Compare this probability with Ry of the preceding problem to find out 
which of the two marriages, Ci X Ca or Ci X aS, is more “immoral.” 

4. Successive generations of self-fertilizing plants. It is well known that 
the flowers of certain plants, such as wheat, can be fertilized by their own 
pollen only. Cross-fertilization by pollen from other flowers is only possible 
by artificial means or when the flower is damaged. Artificial cross-fertili¬ 
zation is used in breeding new varieties. Assume that: 

(i) The original generation is produced by artificial cross-fertilization 
so that Qo = 1 and Po = Fo = 0. 

(ii) There is no selection. 

(iii) In the succeeding generations the maternal organism M and the 
paternal organism F are the same organism. Compute the distribution 
Fn , Q« , Rn in the nth generation bom. 

Amwer: P„ = ie„ = | ^1 - Q„ = |: • 

6. It is sometimes asserted that a pair of genes g, G exist which are 
such that an individual with the combination gg is a “genius.” Whether 
or not this assertion has any connection with reality, the following prob¬ 
lems assume that it is true. Assume also, that successive generations are 
born under panmixia, that there is no selection, and that the genes are 
unlinked with sex. Let Po be the probability of the gene combination gg 
is the original generation born. 

(a) Compute the probability R, that the progeny z of a brother-sister 
mating will be a “genius.” 

Answer: R, = (l + 3\/Po). 




[3*3-6] TWO PAIRS OF GENES 137 

6. Compute == P{'m:gg]y the probability that m = Z> X F is a 
^^genius/' as in 5(a) and as in 5(b). 

Answerio (6a): (l + 3 

7. Compute Ry = P\y :gg]^ the probability that 2 / = Ci X C 2 is a 
‘^genius/' when Ci and C 2 are two cousins. (As drawn in Problem 2.) 

8. Compute R^ = P[v :gg\^ the probability that v = >S X Ci is a 
‘'genius,” when S stands for uncle and Ci for niece. (As drawn in Prob¬ 
lem 2.) 

9. (a) Determine the probability that a man who has a recessive brother 
will have a recessive child. Make the same assumptions as in Problem 2. 
(b) What will the probability be if his wife also has a recessive brother? 
Put /2o = .000,001 and compare these two probabilities with R^ . 

Answer: (a) ~ + 2 y/~R ) ~ 333.3/22 5 (b) g = 111,556722 . 

10. Under the assumptions of Problem 2, find the probability that a 
man who has a recessive uncle will have a recessive child. 

, y/R, 3+11 

Answer: -- - 7 -=—-. 

18 (1+ 2 V«o)(l+3 Vfto) 

^3-3*6. Successive generations under panmixia with no selection. Case 
of two pairs of genes. The problem treated in this subsection is perfectly 
analogous to that in subsection 3*3*3 except that, instead of one pair of 
genes, we will now consider two pairs of genes, say p, G and h, H, not 
linked with sex. It is assumed that the successive generations mate under 
panmixia with respect to these genes and that there is no selection. In 
these circumstances, the results of subsection 3*3*3 might lead one to 
expect that, after a few generations, the distribution of genetical types 
will become stabilized. It will be seen that, in general, this presumption is 
not correct. 

★In order to treat the problem in its full generality, it will be assumed 
that the two pairs of genes may be linked. As before, the letter W denotes 
the probability that the chromosome pair will break between the loci 
of p, G and /i, //. The case of no linkage will correspond to the value W = 1 . 

★As was pointed out in subsection 3*2*5, with two pairs of linked genes, 
there will be ten different genetical types to distinguish, say 

Pi = 9qM P2 == gg^hH Tz = gg,HH 

T4, = gG,hh Tz = gG,hH = gG,Hh T, = gG,HH 

Ts = GGM T, = GG,hH T,o = GG,HH. 



138 


PANMIXIA 


[3-3-6] 

If the genes considered are not linked, then instead of the separate types 
Tt and T* , there will be just one, which we may denote by (T* + T#). 
We will assume that the first population mating, xi , is arbitrary with 
possibly different distributions of the ten genetical types among mothers 
and fathers. Accordingly, the probability that a mother M in the popula¬ 
tion ifi has the genetical type Ti will be denoted by p<. The same prob¬ 
ability for the father will be denoted by g<, i = 1, 2, • • • 10. We will use 
the symbol Pn(0 to denote the probability that a member of the nth 
generation bom will be of type Ti . Because there is no linkage with sex 
and because of the lack of selection, this will also be the probability that 
the mother (father) in the next generation mating will be of type Ti. The 
problem of this subsection is to compute the probabilities P„(i). The dis¬ 
tribution of types in the first generation Pi(i) may be obtained by the 
method which was used in the case of only one pair of genes. Whatever 
be f = 1, 2, • • • , 10, we write 

= 

P\{M:Ti){F :rO(C: Ti) -t- {M :T,)(P::T,) + • • • + (M:T0(P:T.o)(C':TO 
(3-3-18) 

••• ••• 

+ (M: T,oW : T,){C : r,) + (M: T,o){F : T,){C : r,) + • • • + (M: T,o){F : T,o){C : T,) [ 

and then apply the addition and multiplication theorems. Just as in the 
case of one pair of genes, the probability Pi(i) is the sum of probabilities. 
We write this sum briefly as 
10 10 

(3-3-19) P,(i) = 5: T.P{(M : rj(P : T„){C : TOl- 

m-1 n-1 

Even though we use this abbreviated notation, we think of the expression 
Pi(0 as being written in ten rows and ten columns as in the original formula 
(3-3-18). Taking account of panmixia, each probability on the right-hand 
side of (3*3-19) can be written as 

P{{M : PJ(P : r„)((7 : P,)} ’ 

(3-3.20) = P[M : PJ P{F : P„} P[C : P, 1 (M : PJ(P : P,)} 

= p.g.P{C:P, |(M :PJ(P:P0}. 

The last probability is of the type computed in subsection 3-2-6. Let 
Xi and Yi stand for the maternal and paternal reproductive cells from 
the first generation mating, the cells which combine to give C, a member 
of thcf first generation born. Xi carries one of the four couples of genes 



TWO PAIRS OF GENES 


139 


[3-3-6] 

(3•3-21) X, = gk, Xa = gH, x, = Gh, X4 = GH, 

and the same is true for Fi . Tj is obtained by combining the couple of 
genes carried by Xi with the couple carried by Fi . The four types 

Ti — (Xt f Xj), T3 = (Xa , iTa)* Ts — (Xa , Xa), Tio =“ (X4 , X4) 

are combinations of identical couples. All the other tyf)es are combinations 
of two different couples, for example, 

Ti = (xi, Xa). 

In this case, C will be of type Ta if Xi carries Xi and Yi carries Xa or vice 
versa. 

★If we let x' and x" denote any two different couples of genes in (3 • 3 • 21), 
then we may write either 

(3-3-22) P{C :T,\(M :T„KF :T:)\ 

= P{X, :x'\M:T„] P{X, : x' | P : T.} 

for T{ = (x', x'), or 

P{C :T,\(M: T^)(F : T.)} 

(3-3-23) = P{X, : x' I M : T„} P{y. : x" | P : T„] 

+ P{Xx :x"\M: T„} P\Y,-.x’\F: P,} 

for Ti = (x', x"). Substituting (3-3*22) into (3-3-20) and then the result 
of this substitution into (3 • 3 • 19), we obtain Pi(i) for the case Ti = (x', x'), 
10 10 

Pi(i) = Z Z P^Qn P{x, : X' I M : r„} P{ r, : x' I p : P»), 

for i = 1, 3, 8, and 10. In order to simplify this formula it is convenient 
to imagine that the double sum on the right-hand side is written in ten 
rows and ten columns as in (3*3*18). All terms of the first row will have 
m = 1 and, therefore, will have a common factor 

p,P{X,:x'\M :T,}. 

Similarly, all terms in any other row, say the mth, will have the factor 

V^P{X, : x' I M : T^] 

in common. Thus, the expression for Pi(i) can be written as 
Pi(i) = piP{Xi :x'\M : P.}((?.P{ F. : x' | P : Pi} 

+ ?aP{ F, : X' I P : P*} + • • • + g.oP{ F, : x' I P : Pio}) 
+ paP{Xi :x'\M: Ti](qiP{Y, : x'\ F : P.) + 



140 


PANMIXIA 


[ 3 - 3 - 6 ] 

+ qJ>{Y^ : I F : r,j + ••• + q^oP{Y^ :x'\F: r.oj) 

+. 

+ VxoP{X, ■.X'\M: r.o}(?.P{ r, : I P : T,\ 

+ ?,P{ F, : x' I P : Pal + • • • + g.oP{ Fi : x' | P : Pto}). 

The expression in large parentheses is the same in each row, namely 

10 

2 :?.P{P. :®'|P:P«}. 

n—1 

Factoring out this expression, the formula for Pi(f) becomes 

10 10 

(3-3-24) P.(i) = :x'\M:T„\ E g,P( F, : x' | P : P»}. 

m-1 n-1 

This formula applies to all the genetical types P< which require identical 
contributions (x', x') from both the maternal and paternal reproductive 
cells, that is, for i ~ 1, 3, 8, and 10. An analogous formula is deduced 
for the case where P* requires two different contributions x' and x" from 
X, and Fi. In this case, formula (3-3*23) is substituted into (3-3-20) and 
the result into (3-3-19). Continuing the reasoning of the previous case, 
we find 

10 10 

Pi(t) = Z pJ’lXx : I ^ : Pm} Z ?»P{ Pi : a:" I P : P»} 

*n-l n-1 

(3 *3-25) 

10 10 

+ ZpmP{^. : X" I M : Pm) z 9»P{ F, : X' 1 p : P»}, 

m-1 n-1 

for the cases i = 2, 4, 5, 6, 7 and 9. Notice that in both cases the prob¬ 
ability Pi(t) is determined by sums of the type 

10 10 

(3-3-26) E : X' 1 M : P„} and E qJ>{X, : x' } P : P,}. 

m-1 n—I 

The reader will have no difficulty in noticing that the first of these sums 
represents the absolute probability P{X, :.x'} that the maternal cell pro¬ 
ducing one of the members of the first generation bom will carry the 
specified couple of genes x'. The second sum represents a similarly defined 
probability P{Fi : x'} relating to the paternal reproductive cell Fi . 
Since there are four possible compositions of a reproductive cell, namely 
those exhibited in (3*3*21), there wiU be four such probabilities for Xt . 
Let us denote them by Ai , By , Ci , and Dy , respectively. The corre¬ 
sponding probabilities relating to Fi will be denoted by Oy , by , Cy , and 
dy . The computation of these probabilities is conducted in two steps. 
First we use the method of subsection 3-2-5 to compute probabilities of 




TWO PAIRS OF GENES 


[ 3 * 3 ’ 6 ] TWO PAIRS OF GENES 141 

the type P{Xi : z' | M : and then substitute the results into (3-3*26). 

We obtain 

[3-3-27) 

A,=P{Xr : gh] +h>*+M^-hW)+h>,w, 

B^=P{X,:gH\= ^p,+Pz +\p^W +ip,(l-iW)+!P7, 


C.=P{Xi : Gh} = 


b.+iv>W +ps+\p., 


i).=F{X, : G//}= ^5(l-§W)+b,lF +hp>+P^o. 

★The foiTOulae for Ui, 6i, c,, and di are similar except that the p’s are 
replaced by the g’s. 

★Once Ai ,Bi, Cl, and Di and a,, 5,, Ci, and d, are calculated from the 
data of the problem, the probabilities Pi{i) are obtained from (3-3-24) 
and (3-3-25) as follows: 

P i(l) ~ Aitti , 

P.(2) = A A + Biai , 

P.(3) = BA , 


(3-3-28) 


P,(4) = Aici + Ciai, 
Pi(5) = A A + Ditti, 
Pi(6) = P,c, + CA , 
Pi(7) = BA -|- Dibi , 


Pi(8) — PiCi , 

Pi(9) = CA + DiCi, 

Pi(lO) = DA . 

The same kind of reasoning leads to the formulae determining the distri¬ 
bution P„(t) of genctical types in the nth generation bom. Let X„ and 
y* be the reproductive cells from the »th generation mating which combine 
to form an individual of the nth generation bora. Further, let 

A„ = P[X„:gh\ =Piy.;ffh}, 


B„ = P{X.:pP} =P\Y„:gH}, 
C« = PlX„:Gh} =P{Y„:Gh}, 
J[), = P{X.:GP}.= P{y.:GP}. 



142 


PANMIXIA 


[ 3 - 3 - 6 ] 

Then the probabilities P„(i) are obtained from y j Cn , and Dn just 
as the Pi(i) were obtained from Ai , • • • , di . The formulae are similar 
except that, because there is no linkage with sex, the probabilities relating 
to paternal and maternal reproductive cells in the nth generation mating, 
n > 1, are equal. 

★Formulae giving P«(i) will be written in a three-by-three table which 
exhibits the relation between the distribution of the combinations of four 
genes. In the margins is shown the distribution relating to each pair of 
genes taken separately. Let C(„) denote the child born to the nth generation 
mating: that is, C(n) is a member of the nth generation born and thus 
PJj) = P{C(„) : 3\}. It is hoped that the reader will not confuse C(„) 
with the previously introduced Cn • If the two pairs of genes are not linked, 
then the middle cell of the table contains only one instead of two prob¬ 
abilities, namely P{C(n) : {gGy hH + gG, Hh)} = 2(Ar,Dn + PnC'J- The 
marginal colunm and row give the probabilities that a member of the rtth 
generation born will be a recessive, a hybrid, or a dominant with respect 
to one of the pairs of genes considered. 

★The probabilities Pn{i) are now expressed in terms of the probabilities 
An y Bn , Cn , and Dn which are as yet unknown. To determine them, we 
establish the relation brought about by passing from one generation to 
the next. Using the same reasoning as was used to obtain (3-3*27), we 
have 

A, = P[X, : gh] = Pi(l) + iP|(2) + iP,(4) 

-h iPi(5)(l - m + iPi(6)TF. 

Using (3-3-28), this relation becomes 


Aa = Aiai + HAibi + Bitti) + HAiCi + Ciai) 

+ + PiaOd - m + KPiCi + CA)hW 

= ^Ai{ai + fci + Cl + di) + |ai(Ai + Pi + Ci + Di) 


- WhUid, + D,a, - PiCi - Ci6i). 


Noticing that 


Ui + hi + Cl + di = Ai -f- Pi + Cl 4“ Pi = 1, 

and denoting 

(3-3-29) i(Aidi -[- Ditti PiCi Cibi) = j 

we finally obtain 
(3-3-30) 


Aa = KAi + ai) - JTTri. 






144 

Similarly 


PANMIXIA 


[ 3 - 3 - 6 ] 


= KBi + 6 ,) + Wtx , 

13-3-31) ■ C, = KCi + c,) + Wrx , 

D, = K 1>1 + d,) - Wn . 

Since there is no linkage with sex, — 02 D 2 = d 2 and these 

formulae give the probabilities for both the maternal and the paternal 
reproductive cells. Just because of this lack of distinction between the 
probabilities for maternal and paternal reproductive cells, the formulae 
giving A„ , B„ , C„ , and Z), in terms of An-i , B„-i , C'„_i, D„_i, are some¬ 
what simpler. The formulae corresponding to (3-3'30) and (3‘3•31) be¬ 
come when n > 2 

= 4b_i — ^TFr„_i , 

- 1 - Wrn-x , 

C„ = - 1 - Wu-x , 

D„ = - Wrn-x . 

Using (3-3-29), we have 

r„ = A„D„ — for m = 2, 3, • • • . 

Also, it is easy to verify that r, is related to ri by the equation 

(3-3-32) ra = A 2 D 2 - B 2 C 2 = \[A,D, - B,C, 

+ a,d, - biCi + 2(1 - TlOrJ. 

Now An , B„ , C„ , and D„ can be expressed in terms of Aa , 5,, Cj, and 
Da . For this purpose we apply the formulae just obtained and write 

A 3 = Aa — ^Fra, 

A 4 = As — §Fr 3 >, 

• • • 

• 9 • 

An = An~l — hWVn^i . 

★Upon adding these expressions and canceling, we obtain 
An^ A 2 - WiX2 + U+ • • • + rn-i). 



[3’3'6] TWO PAIRS OF GENES 145 

Similarly, 

= ^2 + iW(,r2 + ra + • • • + r,.,), 

C'n = Ca + \W(y2 + ra + • • • + r„_i), 

D„ = Di — W(X2 + ra + • • • + r„_i). 

It is seen that the evaluation of A„ , B„, , and D„ in terms of , 

Ca, and Da depends upon the evaluation of ra + ra + • • • + r„_, . Using 
the definition of r„ , we find, for every m > 2, 

r„ = - ilFr„_,)(D„_, - 

- 

— A„-xDm-l — — 2lFr„_|(i'l„_i + D„_i + Cm-l + D„_i) 

and, since the sum in the parentheses equals unity, 

r„ = r„.,(l - ilf). 

By repeated substitution we obtain 

(3-3-33) r„ = r„..(l - W) = r„-a(l - WY = • • • = ra(l - 
and 


ra + rs + • • • + r„_i 

= rail + (1 - w) + (1 - hwY +■••+(!- wr"] 


= ra 


1 - (1 - wr 


By substituting this result into the formula for , we find 
A„ = Aa - ra[l - (1 - Wn 
or, by using (3-3-33), 


A„ = Aa - ra + r„. 

Bn = B 2 + ra - r„ , 

Cn= Ca + ra - r„ , 


D„ = Da - ra + r„ . 


Similarly 



146 SUCCESSIVE GENERATIONS [3‘3‘7] 

Substituting these results in the expressions for P„(t) we obtain, for every 

» = 1 , 2 , • • • 

p„(l) = (^2 - n + r„)^ 

P„(2) = 2(^2 - rj + r„)(P2 + - r^, 

P„(3) = (Pa + ra - r.)", 

PJ4) = 2(A2 - + r„)iC2 + ra - r„), 

P„(5) = 2(^2 - r* + r„)iD; - + k), 

(3-3-34) 

P„(6) = 2(^2 + r* - r„)(C 2 + rj - r„), 

P„(7) = 2(^2 + ra - r„)(P 2 - rj + r„), 

P„(8) = (Ca + ra - r„)*, 

P„(9) = 2(02 + ra - r„)(P 2 - ra + r„), 

P„(10) = (Z)a - ra + r„)^ 

Combined with (3'3'29)-(3-3-31) and (3-3-32), these formulae deter¬ 
mine the P„(i) in terms of yl, , P, , C, , and />, . Since these latter prob¬ 
abilities are expressed in terms of the original data represented by the 
Pt and Qi , the problem of determining the distribution P„{i) is completely 
solved. 

^3*3-7. Stabilization of the distribution of genetical types in successive 
generations. In this subsection we will examine the formulae for the 
P„(t) from the point of view of changes in the distribution of genetical 
types in successive generations. 

★Reviewing formulae (3•3-34) for the P„(f) it is seen that on the right- 
hand side n appears only through 

r„ = rad - hWr~^ 

for n = 2, 3, • • • It follows that, for the stabilization of the distribution 
of genetical types, it is both necessary and sufficient that r„ be inde¬ 
pendent of n so that 


ra = ra = • • • = r„ . 

Should these equalities hold for all n, then P„(j) = Pa(f), so that stabili¬ 
zation begins with the second generation bom at the latest. 



147 


[3*3’7] STABILIZATION OF DISTRIBUTIONS 

Since, for m > 2 , 

r. - = ^Wr,{l - 

and since 1 — > 0 , it follows that for stabilization of the distribution 

it is necessary and sufficient that either TT = 0 or rg = 0 or both. The 
first of these conditions means that the chromosome pair carrying the 
pairs gG and hH never breaks between the loci of these genes and, there¬ 
fore, the combinations gh, gH, Gh, GH carried in single barrels of the 
chromosome pair are, so to speak, indissoluble, and each combination 
behaves like just one gene. 

★By substituting (3-3-29) into (3-3*32), we see that the other case of 
stabilization occurs when 

T2 = l{AiD, ~ BiCi + Ml - fciCi 

(3-3-35) 

+ (1 - W){A,di - BiCi + DiUi - Ci6i)} = 0. 

It is interesting to notice that, whatever be T7, the value of rg may or 
may not equal zero, depending upon the original probabilities Pi and Qi . 
One might expect that stabilization of types will always occur when the 
two pairs of genes are not linked, that is, when W = 1. The following 
example shows that this presumption is wrong. 

★Assume that the first population mating is composcnl entirely of double 
recessives gg, /i/i, and double dominants (?(/, ////, so that Pi = (/i > 0 
and pio = ^10 > 0, with pi + Pio = + Qio = 1. Then yli = ai = , 

= bi = Cl = Cl = 0, and/)i = ch = pw . It follows from (3*3*35) that 

n = iPipi(.(2 - W) > 0 

so that whatever W > 0 , there will be no stabilization of the distribution 
of genetical types in an^^ of the successive generations. We conclude, 
therefore, that apart from the case W = 0 , the presence or absence of 
linkage between the genes considered is not the determining facitor for 
stabilization of the distribution of genetical types. Since the distril)ution 
of genetical types in the original population mating depends upon eighteen 
probabilities (if W = 1 ) or upon twenty probabilities (if IF 9 ^ 1 ), which 
are subject only to the conditions that ^ Pi = ^ Qi = 1 , it is obvious 
that there are many different ways to obtain rg = 0 and thus guarantee 
stabilization by at least the second generation. An interesting case is when 
stabilization begins with the original generation. Later we will discuss this 
important case in detail. For the moment, however, let us study the 
general case where no stabilization occurs and, therefore, IF 5 ^ 0 and, 
also, rg 0 . 

★In these circumstances, the character of changes in the successive 
generations is determined by the change in 

r„ = rg(l - §IF)““^ 



148 SUCCESSIVE GENERATIONS [3*3*7] 

If W > 0, then (1 — \W) < 1 and r„ becomes smaller and smaller, ap¬ 
proaching zero, as n is increased. We write this technically as 

lim r„ = 0. 

It follows that the values oiA„,Bn,C^ and D„ approach the limits A^ — r 2 , 
+ r*, C 2 + rj, and D-t — r^, respectively, as n increases. Consequently, 
as n —♦ w, P„(i) tends to a limit, say, 

limP„(t) =P(i) 


where, using (3-3•34), 


P(l) 

= (A, - 

- nY, 


P(6) 

= 2(P2 + r2)(C2 

+ »’2), 

P(2) 

= 2(^2 

7 * 2 ) (B 2 

+ >* 2 ), 

,P(7) 

= 2(^2 d" 

- »*2), 

(3-3-36) P(3) 

= (P 2 + n)\ 


P(8) 

= (C 2 + 


P(4) 

— 2(A2 

- r,)iC2 

+ >■ 2 ), 

P(9) 

= 2(C2 + r2)(P2 

- ^ 2 ), 

P(5) 

= 2(^2 

- 

- >* 2 ), 

P(10) 

= (P 2 - r,)\ 



The distribution of genetical types represented by the probabilities P(i) 
will be called the limiting distrihviion. This can be easily expressed in 
terms of Ai , Pi , Ci, and Pi , Oi, 61 , Ci , and di by noticing that, if we 
denote 

i* = ~ (Pi + 2 >i)(Ci + Cl)], 

then 

^2 - rj = + Oi) - r, 

P2 + ^2 = KPl + ^1) + 

C2 + >*2 = KCi + Cl) -|- r. 


P2 - J'2 = KPi + .di) - f. 

These formulae, combined with the expressions for the P(i), provide an 
easy way of computing the limiting distribution from the data relating to 
the first popvilation mating. It will be noticed that the P{i) are determined 
by arithmetic means of the type + Oi)- Recalling the formulae for 
the latter probabilities, it will be seen that the limiting probabilities will 
be independent of the probability W whenever 

Po + 9« = p6 + g#. 

On the other hand, with a fixed value of r*, the speed with which P(i) is 



149 


[3*3’8] STABILIZATION OF DISTRIBUTIONS 

approached by the P^ij) always depends on W, In fact, this speed depends 
on how quickly tends to zero, and this, in turn, depends on how small 
the factor (1 — is. It is seen that the greater the probability Wj the 
faster the probabilities P„(i) tend to their limits P(i). The greatest speed 
is attained when IF = 1, that is, when the two pairs of genes are unlinked. 

★The results obtained may be summarized as follows. 

Theorem 3*1. Whatever be tti , the first population mating^ the distri- 
bution of genetical types in the nth generation born without selection and 
under panmixia with respect to the genes g^ G, hj and Hj either stabilizes at 
the second generation born at the latest or tends to a limiting distribution. If 
W = 0, then stabilization occurs^ irrespective of the distribution in the original 
population mating. If W > 0^ then for stabilization to occur it is necessary 
and sufficient that 

ra = A2D2 — B2C2 = 0 . 

If W > 0 and fa ^ 0, then no two generations born have the same distribution 
of genetical types, but the distribution in the nth generation tends to the limiting 
distribution given by the expressions P(f) in (3*3-36). 

3*3*8. Stable distributions. Let us now turn to the case of stabilization 
beginning with the original generation mating. Although primarily in¬ 
terested in panmixia, we will treat the problem as one of a broad and 
interesting category. 

Consider generally one or more pairs of genes and denote by m the 
total number of genetical types which are possible with these genes. If 
the number of pairs of genes is one, then m = 3. If we consider two pairs 
of nonlinked genes, then m = 9. With two pairs of linked genes m = 10, 
etc. Further, let pi , and P 2 , • • • , and gi , ^2 , • * • , Qm represent the 
distributions of these types among females and among males, respectively. 
The set of 2 m numbers Pi , P 2 , • • • , Pm , gi , ^ 2 , * * • , gm will be described 
as the bisexual distribution of genetical types. Finally, let S stand for a 
system of reproduction of successive generations from one generation born 
to the next. For example, S may stand for lack of selection combined with 
panmixia in the populations mating. Alternatively, S may be mass selec¬ 
tion against recessives combined with panmixia, etc. 

Definition 3*3. A bisexual distribution of genetical types represented 
Pi t p 2 t * * * > Pm Qi j Q 2 f • * • y Qm is called stable with respect to a 
system of reproduction S if, whenever in a population t the distribution of 
the m genetical types among females is pi, pz j • • • , Pm ci'nd that among males 
is Qi , q 2 y • • • y Qm y ihcn the distributions of the same genetical types among 
the females and the males in the generation n born out of mating in t under 
system S are also represented by pi, P 2 , Pm and gi, g2, • • • , gm, re- 
spectively. 



150 


SUCCESSIVE GENERATIONS 


[ 3 - 3 - 8 ] 

If the genes considered are not linked with sex, then it is obvious that 
for a bisexual distribution to be stable, it is necessary but not sufficient 
that the two component distributions be the same: Pi = , P 2 = ^ 2 , * • * i 

Pm — Qm • We will study in detail the case where the genes considered are 
not linked with sex and where the system of reproduction is panmixia 
with no selection. This system will be denoted by So . Other cases appear 
in the. exercises. 

Theorem 3-2. In order that the distribution p, g, r of the three genetical 
types GGj gG, and gg with respect to one pair of genes unlinked with sex be 
stable with respect to the system Sq , it is necessary and sufficient that 

(3-3-37) P = (1 - Vr)^ 

(3-3-38) = 2(1 - Vr) Vr, 

where r is any number between zero and unity. Since p + ^ + r = 1, only 
one of ^e aboye equations need be mentioned, and we may simply require 
that y/p 4* y/r = 1. 

Essentially, the proof of Theorem 3*2 is contained in the reasoning of 
subsection 3*3-3. It is presented here explicitly in order to familiarize 
the student with an important type of theorems on necessary and sufficient 
conditions. The usual form of such theorems is: For a proposition A to be 
true, it is necessary and sufficient that another proposition B be true. 
This assertion means that (i) whenever A happens to be true, then B is 
necessarily true, and (ii) whenever B happens to be true, then A must 
be true, (i) asserts the necessity of B for the correctness of A, (ii) asserts 
that the correctness of B is sufficient to infer that A is true. 

The proof of a theorem on necessary and sufficient conditions is divided 
into two parts. One part proves the necessity and the other part the suffi¬ 
ciency of the condition. To prove that B is a necessary condition for A, 
one assumes that A is true and then deduces B^as a consequence of A. 
To prove that B is a sufficient condition for A, one assumes that B is true 
and then deduces A as a consequence of B. This procedure is illustrated in 
the proof of Theorem 3 • 2. 

Proof of necessity. Assume that the distribution represented by the 
numbers p, q, r is stable with respect to So . If so, then the probability 
that a member of the first generation born under panmixia will be a 
dominant or a recessive must be equal to p and r, respectively. These 
probabilities are given by (3-3-8). Thus, if p, g, r represent a stable dis¬ 
tribution, then 

P = (P + W and r = (r + ^g)*. 

But this means that 

Vp + -nA = (p + ig) + (r + ig) = 1, 



151 


[3*3'83 STABILIZATION OF DISTRIBUTIONS 
and therefore 

p = (l - y/rY, 

g=l-r-p=l-r-(l- VO* = 2(1 - VO V»*. 

which proves the necessity of the conditions stated in Theorem 3 • 2. 

Proof of sufficiency. Assume now that in the distribution p, q, r of the 
three genetical types, the probability of a dominant is 

p = (1 - VO*- 


In order to prove that this assumption implies stability with respect to 
So , it is sufficient to compute the probabilities Pi , Qi , Ri , of the three 
genetical types in the first generation born under the system So . Again 
using formulae (3*3*8), (3*3*37), and (3-3*38), we have 

A = (p + hqY = [(1 - VO* + (1 - VO VO* 

= (1 - VO*[i - Vr + VO* 

= (1 - VO* = p. 

Similarly 

Pi = (r + hqY = [r + (1 - Vr) 

= r[r + 1 — y/rY 


= r. 


But then 


Qi = l-Pi-Pi = l- p- r = g, 

Thus, whenever the distribution p, g, r satisfies the conditions of Theorem 
3*2, the distribution of the three genetical types in the first generation 
born under system So is identical with p, g, r. This completes the proof 
of Theorem 3-2. 

★Consider now two pairs of genes g, G and ft, H unlinked with sex but 
linked with each other. As before, denote by W the probability that the 
chromosome pair carrying these genes will break between their loci and 
assume 0 < IT ^ 1. 

★Theorem 3*3. For the distribution Pi, P 2 , • • • ,Vio of the ten genetical 
types vrith respect to two pairs of genes g, G and ft, H to be stable with respect 
to the system of reproduction So , it is necessary and sufficient 

(i) That the distribution Pi , Pa , • • • , Pio implies a distribuUon of types 
gg, gG and GG which is stable with respect to So ; 



152 SUCCESSIVE GENERATIONS [3*3*8] 

(ii) That the didribvtim of , pt , • • • , pio implies a distribution of 
types hh, hH, and HH which is stable with respect to So ; 

(iii) That the probabilities of the two different types of hybrids gG, hH 
and Hh be equal, Ps = P» 

(iv) If I stands for an individual of a population with the distribution of 
genetical types P\ ,p%, ' ,Pio, then this distribution implies 

P{I : gg,hh] = P{I : gg] P{I : hh\, 

P{I : gg,HH} = /»{/ : gg\ P[I : HH\, 

P{I : GG,hh] = P{I : GG} P{I : hh], 

P{I : GG,HH] = P{I : GG] P{I : HH]. 

Condition (iv) means, in effect, that the set of properties (/ : gg) and 
(/ : GG) is independent of the set (/ : hh) and (7 : HH). 

•kProof of necessity. Assume that the distribution P\ ,Pt , • • • , Pio is 
stable, and consider a population t in which the proportion of females 
and of males having genetical type Ti equals p< , for t = 1, 2, • • • , 10. 
Assume further, that the reproduction of successive generations beginning 
with re follows the system So • Then, whatever n = 1, 2 • • • , P„(i) = p,. 
Also, the values of the PJS) are given by the formulae in subsection 3 • 3 • 6. 
Let J stand for an individual of the nth generation born. To prove the 
necessity of condition (i) it is sufficient to notice that 

P{J : gg] = P{J : gg, hh] + P{J : gg, hH] + P{J : gg, HH]. 

Because of the assumed stability of pi, p*, • • ■ , pio , we have 

P{J : gg] = Pi + Pa + Ps = P{I: gg]. 

Similarly, 

P{J .GG] = P{I :GG] and P{J : gG] = P]! :Gg]. 

The same proof applies to condition (ii). To prove the necessity of condition 
(iii), we refer to a result already established. This is that, with W > 0, 
stabilization of the distributions in the successive generations occurs only 
if rt = 0, so that 

ft — fi = ' ■ * = r, = AJ)n — BjCn = 0. 

By referring to the table of probabilities Pn(i), we note that this implies 
P{J:gG,hH] = P{J:gG,Hh], 
or, because of the assumed stability of pi, pa, • • • , p,o, 


P. = P#. 

Thi6 proves the necessity of condition (iii). 



153 


[3‘3-8] STABILIZATION OF DISTRIBUTIONS 

★We now want to prove the necessity of the first of the four equations 
forming condition (iv). First notice that, because of the assumed stability, 

P[J : ggyhh] = Al = Al, 

P[J : gg] = pi + p., + Va = + J5n)^ 

P{J : hh] = Pi + p4 + Ps = (An + C„)*. 

The first equation in condition (iv) is, therefore, equivalent to, say, 

A = An - (An + Bn)(An + C^) = 0. 

Notice now that, since An + Bn + + Dn = 1, 

An = An(An + Bn + Cn + Bn). 

Substituting this expression instead of the first term in the formula for 
A, we obtain 

A = An(An + Bn) + AnCn + An^n — (An + Bn)(An + Cn), 


= AnBn - BnCn . 

But, as we have just seen, this last expression must be equal to zero. Thus, 
the stability of the distribution Pi , P 2 , * • • , Pio implies the first of the 
equations (iv). The proof of the other three equations (iv) is similar. This 
completes the proof of the necessity of the conditions mentioned in 
Theorem 3 • 3. 

★Proof of sufficiency. The proof of sufficiency consists in assuming that 
the conditions of Theorem 3-3 are satisfied, then computing the prob¬ 
abilities Pi{i) of the genetical types in the first generation born and com¬ 
paring them with the p. . Using conditions (i) and (ii) and referring to 
Theorem 3*2, we conclude that there must exist two numbers, say Rg 
and Rh , such that 

P{I : gg} = Pi + P2 + Pa == Pa , 

P[I : GG] = p» + + P .0 = (1 - VR.y, 

P{I ; gG] = Pi + Ps + p» + Pt = 2(1 - y/R,)y/R,, 

P{I ; hh] = Pi +Pi + Pa = Rh 

P{I : HH] =p, + pr + p,„ = (1 - VRhY 

P\I : hH] = P2 + P5 + P. + P. = 2(1 - y/Rh)y/Rh • 

We will now use condition (iv) and then condition (iii) to obtain the 










155 


[3*3*8] STABILIZATION OF DISTRIBUTIONS 

values of Pi, , • • • , pio in terms of R, and Rh . It will be convenient to 

arrange these values in a three-by-three table with an additional column 
and row for the marginal probabilities, as was done for P»(t) in Table 3*5. 

ii^Condition (iv) determines directly the corners of the three-by-three 
table. In order to fill in the gaps, we notice that the marginal probabilities, 
which are already determined by conditions (i) and (ii), are the sums 
of the probabilities pi in the corresponding rows and columns. Accord¬ 
ingly, we have 

P2=/2,-p,-p3=f2,[l-iZ*-(l- \i%)y/R^, 

P4=fZ»-p,-p8 =2/J»(l- y/R,) y/R,, 

p,= (l-V^)*-p3-pio =2{l-y/R,y{X-y/R,)y/R, , 

p,= (l-\/^)*-Ps-p.o =2(1-v'^)*(l-, 

and, finally 

P»+Pa=2(l- -y/R,) \/li.-P4-P7=4(l- V*.) VB*) • 

Condition (iii), then, gives 
P5=P8=2(1-\/B,)'\/B,(1-\/B*)-\/B* . 

★It is seen that conditions (i) through (iv) determine the probabilities 
Pi as functions of two numbers R, and Rh , which can be chosen arbi¬ 
trarily between zero and unity. Substituting the formulae for the Pi into 
the expressions (3-3-27) for the Ai , J?i , Ci , and Di , we obtain, after 
some easy algebra, 

Oi = = y/R, y/Rn , 

bi = fii = y/R, (1 - ^Rh), 

Cl = Cl = (1 - y/R,) y/Rn , 


di = Di = (1 - - y/Rn). 

The last step consists in substituting these values into formulae (3*3*28) 
which give the expression for the Pi(f). The substitution is easy and gives 

Pl(l) = RgRh = Pi > 

P,(2) = 2R,(,l - VRh) y/Rn = v% , 
etc. This completes the proof of Theorem 3 • 3. 



156 


SUCCESSIVE GENERATIONS 


[ 3 - 3 - 9 ] 


is interesting to note that the conditions of stability with respect 
to the system of reproduction So are independent of the probability 
W > 0 that the chromosome pair will break between the loci of the two 
pairs of genes considered. 

★Theorem 3*3 implies that to construct an example of a distribution 
stable with respect to So , it is sufficient to select two numbers Rg and Rh , 
between zero and unity, and substitute them into the formulae for the 
Pi above. For example, taking Rg == .49 and Rh = .16, so that y/Rg ^ .7, 
y/Rn = .4, we obtain the following distribution, which is stable with 
respect to So . 


p, = .0764 

Pa = .2372 

Pa = .1764 

.49 

P4 = .0692 

p« = Pb = .0998 

jh = .1512 

.42 

p* = .0144 

p. = .0432 

Pio = .0324 

.09 

.16 

.48 

.36 

1.00 


★3*3*9. Example. Certain interesting points of the foregoing theory are 
concerned with the role of the probability W in the question of stabilization 
of the distribution of genetical types. To illustrate some of these points, we 
select an example in which the distribution of the genetical types in the 
first generation born Hi is independent of W. We then consider the dis¬ 
tributions Pn(i) in the following generations computed under several 
alternative hypotheses concerning W, namely, W = 0, TT = .5, and 
W = 1 . 0 . 

★Upon inspecting the formulae for Ai , , Ci , Di , it is evident that 

the distribution of types in Hi will be independent of IF if ps = pe and 
Qh ^ Qo • With this in mind, we adopt the following distributions of ge¬ 
netical types in the first population mating ti . 

Mothers Fathers 


Pi = 

.4 

P6 = 

.0 

ffi = 

.1 

P 

ii 

P2 = 

.0 

Pt = 

.0 

ga = 

.3 

P 

II 

P3 = 

.0 

P8 = 

.0 

= 

.0 

p 

II 

00 

P4 = 

.3 

Pq = 

.2 

3« = 

.1 

II 

P5 = 

.0 

Pio — 

.1 

3b = 

.0 

p 

II 

o 


Easy arithmetic gives 







157 


[3“3’9] STABILIZATION OF DISTRIBUTIONS 


= .56 

Cl — .30 

= 0 

6i = .16 

Cl = .25 

Cl “ .20 

II 

Q 

di = .35. 


The distribution Pi(i) of genetical types in IIi is obtained from formula 
(3-3*28). We write this distribution in the form of a three-by-three table. 
The marginal column and row give the distributions of types with respect 
to single pairs of genes. 


Table 3-7 

Distribution of Genetical Types in IIi 



M 

hH 

HH 


90 

.1650 

.0825 

0 

.2475 

gO 

.1850 

hH Hh 

.2525 .0375 

.0300 

.5050 

GG 

.0500 

.1275 

.0700 

.2475 


.4000 

.5000 

.1000 

1.0000 


It is easy to see that, whatever TT, the distribution of genetical types in 
III is not stable. This applies not only to the distribution of the ten ge¬ 
netical types with respect to two pairs of genes but also to marginal 
distributions. In fact, the marginal distributions do not satisfy the neces¬ 
sary and suflBcient conditions of stability of Theorem 3-2 that 

Vp+ y/r = 1, 

y/A + •v/.l ^ 1, 

V'-2475 + \/.2475 1. 

According to the general theory of subsection 3-3*3, the marginal distri¬ 
butions will necessarily stabilize in and this, of course, occurs inde¬ 
pendently of the value of W. For the stabilization in Ha of the distribution 

















158 


[3-3'9] 


SUCCESSIVE GENERATIONS 
Table 3-8 

Distr&nition of Genetical Types in IIj, IIs, • • > 
C<ise W = 0 



hh 

hH 

HH 


99 

.1806 

.0638 

.0056 

.25 

gO 

.1912 

hH Hh 

.2338 .0338 

.0412 

.50 

GO 

.0506 

.1238 

.0756 

.25 


.4225 

.4550 

.1225 

1.00 


Table 3-9 


Distribution of Genetical Types in 112, 1X3, n4 and n. 
Case TT = .5 



Generation 

hh 


hH 


HH 

Generations 

Ha “* n® 


Ha 

.1585 


.0811 


.0104 



ni 

.1443 


.0913 


.0144 


99 

Hi 

.1341 


.0980 


.0156 

.2500 


Hflo 

.105625 


.11375 


.030625 





hH 


Hh 




Ha 

.2006 

.1976 


.0513 

.0506 



n, 

.2052 

.1746 


.0649 

.0552 


gO 

n4 

.2079 

.1583 


.0760 

.0579 

.5000 


Hco 

.211250 

.11375 

. 

.11375 

.061250 



Ha 

.0634 


.1250 


.0616 



Hi 

.0730 


.1242 


.0528 


GO 

n4 

.0806 


.1227 


.0467 

.2500 


n« 

.105625 


.11375 


.030625 



Hf - n«, 

.4225 


.4550 


.1225 

1.0000 





































159 


[3’3’9] STABILIZATION OF DISTRIBUTIONS 

Table 3 • 10 


Distribution of GeneticcH Types in 112 , 113 , 114 , and n„ 
Case W = 1 



Generation 

hh 


hH 


HH 

Generations 

Hs — Hoo 


112 

.1378 


.0956 


19 



ns 

.1212 


.1057 




99 

n4 

.1133 


.1100 


.0267 

.25 


n<o 

.105625 


.113750 


.030625 





hH 


Hh 




n* 

.2070 

.1643 


.0718 

.0570 



n, 

.2102 

.1379 


.0917 

.0602 


gO 

Hi 

.2110 

.1256 


.1025 

• 

.0610 

.50 


n«> 

.21125 

.11375 


• 

.11375 

.06125 



Hs 


lllllll 

■■ 


.0490 



n, 



■19 


.0393 


GG 

Hi 



.1170 


.0348 

.25 


n«, 

.105625 


.11375 


* i 
. 1 

.030625 



II2 - Ileo 

.4225 


.4550 


.1225 

1.0000 


of the ten genetical types, it is necessary and sufiBcient that either 17 = 0 
or, as given in (3 • 3 • 35), 


Ta = i[AiDi - J3,C, + o,d, - biCi 


+ (1 - W)(A,d, - + D.o, - C.6.)] = 0. 


This equation can be solved with respect to W, giving 

~ - BiCi + aidi - biCi + Aid, - J8iCi + DiOi — Cihi 

Aidi — BiCi + Dial — Cibi 


W', say. 











160 


SUCCESSIVE GENERATIONS 


[ 3 - 3 - 9 ] 

If the solution is a number between zero and unity, then, with this partic¬ 
ular value of W, stabilization will occur in the second generation born. 
However, if IT' is either negative or greater than unity, the only case of 
stabilization is when W = 0. In the present example, 

ra = 0.1000 - 0.05375 W, 

and it is seen that TT' > 1. Thus, the only case in which the distribution 
of the ten genetical types will stabilize is when TF = 0. In all other cases 
the distributions of the ten genetical types in successive generations 
Ha , Ha , • • • , n„ , • • • will all be different and will tend to a limiting dis¬ 
tribution P(z) given by formula (3*3-36). 

★Tables 3 • 8, 3 • 9 and 3 • 10 give the distribution in successive generations 
corresponding to the three different values of W. 

★It will be noted that the marginal distributions are the same for all 
values of W and for all successive generations. 

★ PROBLEMS AND EXERCISES 

1. In a population v of mothers M and fathers F, the distribution of 
genetical types is given by 

P{M : ggM] = *5 P{F : gG,hH} = .1 

P[M : ggm = .3 P[F : gG,Hh\ = .2 

P{M : GG.hH] = .1 P{F : GG,hh] = .3 

P[M : GG,HH] = .1 P[F : GG,hH] = .4. 

The two pairs of genes gf, G and /i, H are carried in the same chromosome 
pair. Let W stand for the probability that the chromosome pair will break 
between the loci of these genes. Mating in tt and in successive generations 
is panmixia. There is no selection. Assume IF = 0 and determine the 
distribution of genetical types in successive generations born beginning 
with the first. Will stabilization occur? 

2. Same problem as in 1, but under conditions (a) IF = i and (b) IF = 1. 

3. A succession of generations is mating under panmixia and there is 
no selection. In the first population mating, all the males are of the same 
t 3 T)e gGj hH, while the females are either double receasives gg, hh or double 
dominants, GG, HH. The two pairs of genes are carried in the same chro¬ 
mosome pair with the probability of breaking between their loci IF = .2. 
What should the proportion pi of double recessives among the females be 
so that the distribution of genetical types in the successive generations 
becdmes stabilized? Determine this stable distribution. 



PROBLEMS AND EXERCISES 


161 


[ 3 - 3 - 9 ] 

4. In the general conditions of problem 3 put pi = .8, pio = .2, TT = .2 
and determine whether stabilization of the distribution of genetical types 
will ever occur. If not, determine the limiting distribution P(i), i = 
1 , 2, • • • , 10. Is this distribution stable? 

5. In a population w the distribution of the ten genetical types with 
respect to two pairs of linked genes is, in the usual notation. 


Pi = 

.01 

Pfl = 

.12 

P2 = 

.04 

P7 = 

.16 

P3 = 

.04 

Ps = 

.09 

P4 = 

.06 

Po = 

.24 

Ps = 

.08 

Pio = 

.16. 


Is this distribution stable with respect to the system So of reproduction 
consisting of panmixia and no selection? Assume that the probability of 
the chromosome pair breaking between the two pairs of genes is (a) 
TT = 0.1 and (b) IF = 1.0. Obtain the corresponding limiting distributions 
and compare the speed with which the probabilities Pn{i) tend to P{i) in 
the two cases. 

6 . Two pairs of genes gr, G and /i, H are carried in the same chromosome 
pair. The probability of the chromosomes breaking between the loci of 
these genes is W > 0. In a population t the distribution of the genetical 
types is stable with respect to the system of reproduction So which con¬ 
sists in panmixia and no selection. It is known that the population tt 
contains no double recessives gg, hh. Compute the distribution of genetical 
types in ir. 

7. Under general conditions of problem 6, assume that the population 
TT does contain double recessives ggy hh but fails to contain double hybrids 
of the type gG, hll. What can one say about the distribution of types in w? 

8 . A distribution P(i) of ten genetical types with respect to two pairs 
of linked genes is a limiting distribution under panmixia and no selection. 
Test whether or not this distribution is necessarily stable. 

9. Mothers and fathers forming the first population mating tti have the 
following distributions of types with respect to two pairs of genes which 
are not linked with sex 


II 

P 

II 

II 

b 

q 

II 

II 

P 

9a ~ 

CO 

II 

97 — .0, 

II 

93 = -1, 

Ps = .1, 

98 = -2, 



162 


SUCCESSIVE GENERATIONS 


[ 3 - 3 - 9 ] 


P* “ -0, 

= 

. 1 , 

p, = .2, 

g» = .0, 

p. = ,0, 

?6 - 

.0, 

Pio = -3, 

Qio — *0. 


Consider successive generations following ti under panmixia and without 
selection. Determine all the values of the probability W with which the 
distribution of ten genetical types will stabUize in the second generation 
bom. Compute the corresponding stable distributions. 

10. Consider the first population mating described in the preceding 
problem and determine the limiting distribution P(t) of the ten genetical 
types. Does this limiting distribution depend on Wf 

11. Consider successive generations of the two preceding problems, put 
W = .1 and determine n so that P,(l) differs from its limiting value P(l) 
by less than ten percent of the latter. That is, find n so that 

I P»(l) - P(l) I < ^ . 

Is the limiting distribution necessarily stable? 

12. Consider one pair of genes g, O and a variety of self-fertilizing 
plants. It is more or less obvious that the only distribution of types GGy 
gG, gg which is stable with respect to self-fertilization is that not containing 
any hybrids. However, give an exact proof of this proposition. 

13. Consider one pair of genes g, G and a variety of plants which, in 
general, are self-fertilizing. Assume, however, that a certain proportion 
a of seeds are produced under panmixia and without selection, due to 
some flowers being cross-fertilized by insects. Denote by p, q, and r the 
proportions of dominants, hybrids, and recessives, and determine the rela¬ 
tions between p, g, r and a so that the distribution is stable with respect 
to the system of reproduction just described. 

Answer: Stable only if a = 1 with p = (l — \/r)*, 

g = 2 \/r(l — \/r); or if a = 0 with 3 = 0. 
REFERENCES 

1. F. Bernstein, ‘‘Variations und Erhlichkeitsstatistic.” Handbuch der Verehungs- 

vnasenschafty Vol. 1 (1929), p. 1. 

2. S. Bernstein (translated into English by E. Lehmer), “Solution of a mathematical 

problem connected with the theory of heredity.** Ann. Math. Stat.y Vol. 12 
(1942), p. 63. 

3. T. Dobzhansky, Genetics and the Origin of Species. New York: Columbia University 

Press, 1941. 

4. H. Geiringer, “On the probability theory of linkage in Mendelian heredity.** Ann. 

Math. 8tat.y Vol. 16 (1944), p. 26. 

6. J. B. S. Haldane, The Causes of Evolution. London: Harper, 1932. 

^ See also a long series of articles published between 1923 and 1936 by the Cam¬ 
bridge Philosophical Society, first in the Transactions and then in the Proceedings. 



REFERENCES 


163 


[ 3 - 3 - 9 ] 

These articles have the same title: mathematical theory of natural and arti¬ 

ficial solution.” The first memoir appeared in the Trans, Camh, Philos, Soc,, 
Vol. 23 (1923), p. 19. 

6. G. H. Hardy, '^Mendelian proportions in a mixed population.” Science^ Vol. 28 (1908), 

p. 49. 

7. T. H. Morgan, A. H. Sturtevant, H. J. Muller, and C. B. Bridges, The Mechanism of 

Mendelian Heredity, New York: Henry Holt, 1915. 

8. E. W. Sinnett and L. C. Dunn, Principles of Genetics. New York: McGraw-Hill, 1939. 

9. S. Wright, “The distribution of self-sterility alleles in populations.” Genetics, Vol. 24 

(1939), p. 538. 



CHAPTER IV. 


Random Variables and Frequency 
Distributions 


4’1. Random Variables 

4* 1 • 1. Concept of a function. Let S denote a set of objects A. In partic¬ 
ular, the ^‘objects'' A may be numbers and S may stand for all the numbers 
between zero and unity. Alternatively, the objects A may be chords in a 
given circle and S may stand for either the set of all chords or for the set 
of chords satisfying some specified condition such as chords parallel to a 
fixed direction, etc. In mathematics one is frequently led to consider rules 
which associate a well-defined number or, occasionally, several well- 
defined numbers with every object A of a given set The number asso¬ 
ciated with a particular object A is denoted by a letter followed by paren¬ 
theses around the symbol for the object. For example, we may write 
X{A) to denote a number associated with A by one rule and F(A) to 
denote the number associated with the same object by another rule, etc. 

Definition 4*1. A quantity having a welUdefined numerical value {or 
values) for each object belonging to a set S is called a numerical function 
defined over the set S. 

Following are some examples. 

(i) Let Si consist of all numbers, say x, between the numbers — 1 and 
+ 1, inclusive. Further, let the quantity Fi(x) be defined by the following 
rule /if 1 : if a: is a rational number, then Fi{x) = 1; if a: is an irrational 
number, then Fi{x) == 0. Obviously, this rule associates a well-defined 
number (unity or zero) with each and every object forming the set Si . 
Thus Fi{x) is a numerical function defined over the set Si . 

(ii) Consider now the same set Si and a ^•ule R 2 associating with each 
object X of Si (each number between —1 and +1) a number/i (a;) com¬ 
puted by multiplying x by itself. In other words, /i(a:) = x^. Obviously, 
/i(a:) so defined is a numerical function defined over Si . 

It will be noticed that both Fi{x) and/i(x) have just one value asso¬ 
ciated with any particular element of Si . Therefore, they are described 
as single-valued functions defined over Si . 

(iii) Rules Ri and R ^, which were used to define functions over the set 
Si, can be used to define functions over sets other than Si • For example, 
we may apply the same rules to define functions, say 

164 



165 


[4’1^1] CONCEPT OF A FUNCTION 

F 2 {x) BXidfiix) over the set Sa of numbers x such that —2 ^ a? g 2 , 

F^ix) and/aCx) over the set S 3 of numbers x such that —3 ^ a: ^ 3 , 

etc. Finally, we may use the same rules to define the functions, say Fo{x) 
and foix)f over the inclusive set So of real numbers x extending from 
— 00 to + «>. Although the definition is the same in all cases, the func¬ 
tions defined are considered different functions because they are defined 
over different sets. 

(iv) There are rules defining numerical functions over some sets of 
numbers which cannot be used to define functions over some other s^ts. 
Consider, for example, the following rule. : V(x) is a real number such 
that its square is equal to x. The widest set of real numbers over which 
the rule defines the function Y{x) is the set, say >S*, composed of all 
non-negative numbers x, i.e., numbers satisfying the condition 
0 g a; < + 00 . Since the square of any real number is zero or positive, it 
follows that rule R^ does not apply to negative numbers. 

Another interesting detail concerning rule R^ is that the function Y{x) 
defined by this rule over the set S^ is not single-valued. In fact, if a:' is a 
positive number then both — and +\/a;' satisfy the definition of 

(v) Thus far we have considered numerical functions defined over sets 
of numbers. To illustrate the concept of a function defined over a set of 
objects other than numbers, let S stand for the set of all chords in a circle 
with radius unity. Thus, each object A belonging to aS is a chord. Further, 
let Z{A) be defined as the length of the chord A. Obviously, Z(A) is a 
single-valued numerical function defined over the set S. So is TF(A), the 
distance between the chord A and the center of the circle. 

The concept of a function should be distinguished from that of the 
value or values assumed by the function. The functions Fi{x) and F 2 (x) 
defined in (i) and (iii) by rule Ri are capable of assuming the same values, 
namely unity or zero, and no others. Thus, the sets of values of these 
functions are identical, yet the two fimctions are different: Fi(x) is defined 
over the set Si and F 2 {x) over the set S 2 . To emphasize the distinction 
between the concept of a function and that of the set of values of the 
function, consider the function defined over the set Si as follows: 
whatever the number x of the set Si , the function 0 (x) = 1 — Fi(x). 
Obviously, just as with Fi(x)j the function 0 (a;) can assume the values 
unity or zero and no others. Moreover, both Fi(x) and <l>(x) are defined 
over the same set Si . Yet they are different functions because the asso¬ 
ciations between the numbers of the set Si and the values of Fi{x) and 0 (x) 
are different; whenever Fi(x) is equal to zero, the function (pix) is equal 
to unity and vice versa. 

When discussing functions, it is frequently convenient to use the word 
argument to describe the kind of objects which form the set over which 



166 


RANDOM VARIABLE 


[4-1-2] 

a given function is defined. Thus we say that the argument of the function 
Z{A) is the chord A in a circle of unit radius. The argument of the function 
Fxix) is the real number z between —1 and +1, etc. 

PROBLEMS AND EXERCISES 

Answer the following questions for each of the functions defined in 
questions 1 through 18: (a) What is the widest set of objects over which 
the function is defined? (b) What is the set of values of the function? 
(c) Is the function single-valued? (d) In Problems 13 through 18, try to 
find a way of representing the functions other than in words. 

1. Y{z) = sinx. 

2. Y{x) == arcsinx. 

3. Y(x) == sec X. 

4. y(x) = 2 + sec X. 

5. y(x) = arcsec X. 

6. F(x) = tangx. 

7. Y(z) = —3 + tangx. 

8. Y{x) = arctangx. 

9. Y(x) = 

10. r(x) = 

11. Y(x) = lo g X. 

12. F(x) = V^og x. 

13. F(x) = number of days from September 1 to Labor Day in year x. 

14. F(x) = the amount of postage required on a letter weighing x ounces 
to be delivered within the United States. 

15. F(x) = the number of even whole positive numbers less than x. 

16. F(x) = the area of a circle of radius x. 

17. F(x) = the altitude in feet of the point within the United States x 
miles due west of Washington, D. C. 

18. F(x) = the length of time in seconds for the light from the sun to 
travel to the earth when x is the distance between the two. 

4*1*2. Random variable. Let be a fundamental probability set com¬ 
posed of some objects A. 

Definition 4*2. Every single-valued numerical function X(A) defined 
over the fundamental probability set is called a random variable. 

The following simple theorems describe some of the most important 
properties of random variables. 

Theorem 4*1. If X is a random variable^ then, whatever he the real 
number t, there exists the probability, say 

Pxit) = P{X = t], 
that X will assume the value t. 



[4*1’2] DEFINITIONS AND ILLUSTRATIONS 167 

To prove Theorem 4-1, notice that if X is a random variable then it 
must be a single-valued function defined over the F.P.S. Hence, to every 
object A belonging to the F.P.S. there corresponds a well-defined unique 
value of X, say Z(.4). This value of X is, then, a property of the object A. 
To determine the probability Px(t), it is sufficient to count those elements 
of the F.P.S. for which X(4) = t and to divide their number, say n(X = t), 
by the total number N of elements of the F.P.S. Thus 

Px(t) = P{X = <1 = 

It will be noticed that the probability px{t) = P{X = t} is defined for 
all real numbers In other words, it is a single-valued function defined 
over the set of all real numbers t. 

Definition 4-3. The function px{t) = P[X = t} defined over the set 
of all real numbers t and representing the probability that the random variable 
X will assume the value t is called the frequency function of the random 
variable X. 

Roughly speaking, the frequency function of a random variable X de¬ 
termines, for each value tj how frequently X is equal to t. 

Theorem 4*2. If X is a random variable, then, whatever be the real 
number t, there exists the probability, say 

Fx{t) = P{X g t\ 

that the value assumed by X will not exceed t 

The proof of Theorem 4*2 is very similar to that of Theorem 4-1. In 
order to compute Fx{t) it is sufficient to determine the number, say 
n(X g t), of those elements of the F.P.S. for which the single-valued 
function X = X{A) does not exceed t and to divide by the total number 
N of elements of the F.P.S. 

Fxit) ^P{X ^t\ = 

The probability Fx{t), the existence of which is asserted by Theorem 4-2, 
is defined over the set of all real numbers t 

Definition 4*4. The function Fx{t) = P\X ^ defined over the set 
of all real numbers t and representing the probability that the random variable 
X will not exceed t is called the distribution function of the random variable X. 

Remark: The terminology relating to functions px{t) and Fx{t) is hot 
quite established, so that one frequently finds terms in the literature other 
than those used here. Frequently these functions are referred to as the 
probability laws of the variable X. The adjective ‘'cumulative’’ is attached 



168 


RANDOM VARIABLE 


[4-1-2] 

to either “distribution” or “probability law” when used to denote Fxit). 
When pxit) is known,-then Fx(t) is easy to compute and vice versa. For 
this reason we shall occasionally use the term “distribution of X” to denote 
either pxit) or Fxit). 

Example 1. Let the F.P.S. consist of the six sides of an ordinary die, 
and let X stand for the number of dots on a given side. Obviously, X is a 
single-valued numerical function defined over the F.P.S. and, therefore, a 
random variable. 

The possible values of X are 1, 2, 3, 4, 5, and 6 and each is assumed for 
just one element of the F.P.S. Consequently, 

Px(l) = Px(2) = • • • = px(6) = i 

If t stands for any number other than 1, 2, 3, 4, 5, 6, then 

p^{t) = P{X = «} = 0. 

This completes the definition of the frequency function of X; it is equal 
to J for i = 1, 2, 3, 4, 5, 6 and to zero for all other values of t 
To define the distribution function of X, first notice that the function 

= P[X g t\ 

is equal to zero for all values of < < 1 because X cannot be less than unity. 
Further, if t is any number between the limits 1 ^ f < 2, then the only 
way in which X can be ^ f is by being unity. Thus for 1 ^ f < 2, 

Fjcif) = P{X ^ = P.{X = 1} = vx{l) = i 

Similarly, if 2 ^ < 3, then the variable X can satisfy the condition 

X ^ t either by equaling unity or by equaling 2. Since these two circum¬ 
stances are exclusive, we have 

Fz(0 = P{X ^t] ^ P{{X = 1) -h (X = 2)} 

= P{X = 1} +P{X = 2} 

= Px(l) + Px(2) = |. 

Proceeding in the same manner it is easily found that, for all values of t 
between the limits 

where k is an integer, the probability distribution function 




[4-l*2] DEFINITIONS AND ILLUSTRATIONS 169 

Finally, if ^ ^ 6, then X is certain to satisfy the condition X ^ It follows 
that, for ^ ^ 6, the function Fx{t) = 1. 

Figures 11 and 12 give the graph of the function px(t) and Fx(t)y re¬ 
spectively. The first graph is composed of the axis of t, with the exception 
of the points corresponding tot = 1, 2, 3, 4, 5, 6. At these points px{t) = i 
and, therefore, the graph of px(t) has six points above the t axis, each at 
the horizontal level of 

The graph of Fx(t) starts with the interval from — <» to < = 1 on the 
t axis. Then there are six ^^steps^' corresponding to ^ = 1, 2, 3, 4, 5, and 6. 
At the sixth step, Fx{t) reaches the value of unity and remains at unity 
for all < ^ 6. 



t 

Figure 11. Graph of the Frequency Function 


Functions with graphs resembling that of Fx(t) are sometimes described 
as step-functions. They are constant over certain intervals of the inde¬ 
pendent variable and have sudden increases or decreases of values at the 
ends of these intervals. 

Example 2. Let the fundamental probability set S consist of all the 
different groups of 8 cards which can be selected out of a 20-card deck 
made up of only aces, kings, queens, jacks, and tens. Let X stand for the 
number of aces in the group of 8 cards and Y for the number of kings. 

Obviously both X and Y are single-valued numerical functions defined 
over the F.P.S. Hence they are random variables. Since an eight-card 
hand must contain no aces or one ace or two or three or four aces and no 
more, the random variable X is capable of possessing only five values, 



170 


RANDOM VARIABLE 


[4-1-2] 

0, 1, 2, 3, and 4. These are also the possible values of Y. The student will 
have no difficulty in perceiving that the similarity between the random 
variables X and Y extends further and that the frequency functions of 
the two variables, and also their distribution functions, are identical, 

Px(t) = Py(t)> P Jt(0 = F r(0 

for all values of t. Nevertheless, the variables X and Y are different. For 
example, an eight-card hand may include four aces and no kings. Then 
X = 4 and r = 0. 



Figure 12. Graph of the Distribution Function Fxif). 


To determine the frequency function of X, we follow the steps indicated 
in Example 1. Since the only possible values of X are t = 0, 1, 2, 3, and 4, 
we have Px(0 = P{X = i) =0 for any other value of f. If i stands for one 
of the five integers 0, 1, 2, 3, 4, then to determine the corresponding px(() 
one must compute the probability P(X = f} that an eight-card hand 
selected from the 20-card deck will contain exactly t aces. Probabilities 
of this kind were computed in Chapter 2, and the student will have no 
difficulty in verifying that 


Px(t) = P{X = t] 


C\C\V ^ 4!8!l2!16t _ 

Cl 20!<!(4 - 0K8 - m + t)\‘ 


Substituting the consecutive values of < = 0, 1, 2, 3, 4 and performing 
easy arithmetic, we obtain the values of px(t) in the following table. 



[4’1’2] DEFINITIONS AND ILLUSTRATIONS 171 

The table of values of px(0 naay be used to compute the distribution 
function of X for ail values of t. Since X cannot have negative values, it 

Table 4-1 


Frequency functions of X and Y 


t 

II 

0 

.1022 

1 

.3633 

2 

.3814 

3 

.1387 

4 

.0144 

Total 

1.0000 


follows that 

F^t) = P{X ^ = 0 

for all values of t less than zero. If 0 ^ < 1, then the only way in which 

X can satisfy the condition X ^ < is by equaling zero. Hence, for 0 ^ < 1, 

Fxit) = P{X ^ M = P{X = 0} = px{0) = .1022. 

As in Example 1, the distribution function Fx{t) is a step function with 
jumps at 0, 1, 2, 3, and 4. In fact, following the reasoning which by now 
must be familiar to the student, we find the following table; 

Table 4* 2 


Distribution Function of X and of Y 


t 

II 

t < 0 

0 

0 g { < 1 

Px(0) = .1022 

1 g < < 2 

PxiO) + px(l) = .4655 

2 g < < 3 

PxiO) + PxH) + Px(2) = .8469 

3 g < < 4 

Px(0) + Pxd) + Px(2) + px(3) = .9856 

4 g t 

Px(0) + Px(l) + Px(2) + Px(3) + Pxd) = 1.0000 


Figures 13 and 14 give graphs of the freq\iency function px{t) and the 
distribution function Fx{t)f respectively. Although these graphs differ in 
some details from those in Example 1, they have many characteristics in 


172 


RANDOM VARIABLE 


[4-1-2] 

common. In both cases, all the points of the graph of px(0 he on4he <-axis 
with the exception of those corresponding to the possible values of the 
random variable concerned. At these values of t, the graph of Pxit) has 
points above the /-axis at levels equal to px{t)- In Example 1 all the points 
are on the same horizontal level, In the present example the levels differ 
because the values 0, 1, 2, 3, and 4 are assumed by X with different relative 
frequencies, the least frequencies corresponding to the extreme values 
0 and 4. 



t 

Figure 13. Frequency Functions pjr(0 =» Pr(0. 


In both cases the distribution functions are step-functions, rising from 
the value zero to the value unity by as many steps as there are possible 
values of the random variable X, In Example 1 the steps are of equal 
height, namely, In the present example the steps are smaller at the 
ends of the range of possible values of X and larger in the middle. 

Example 3. Consider the fundamental probability set S and the random 
variable X as defined in Example 2. Let Z be defined by the relation 

^ - 8 - X. 

In other words, for each group of eight cards, the variable Z represents 
the number of cards the group includes which are other than aces. Since 
Z has a well-defined unique value for each element of the F.P.S., it is a 
single-valued function defined over the F.P.S. and, therefore, a random 
variable. The possible values of Z are 4, 5, 6, 7^ and 8. The student will 



[4’1*3] JOINT DISTRIBUTION 173 

have no difficulty in computing the frequency function and the distribution 
function of Z. 

This example illustrates the general fact that, if X is a random variable, 
then every single-valued function of X is also a random variable whose 
distribution may be obtained from that of X. More generally, every single¬ 
valued function of several random variables, say Xj , Xj , • • • , , is 

also a random variable whose frequency function and distribution function 
can be obtained from those of Xj , X 2 , • • • , . 


A 

1.0 

1 


- 

.8 

- 


- 

.6 

- 

.4 

- 

.2 

_ 1 _I_ 1 _ 1 - 1 - 


J_I_I-1-1-u 

0 1 2 3 4 5 

t 

Figure 14, Distribution Functions Fx{t) =» Fy(0. 


4 • 1 • 3. Joint distribution of several random variables. In many problems 
of statistics it is necessary to consider simultaneously a number of random 
variables. Various concepts connected with this situation are discussed in 
the second volume of this text. Here we shall use Example 2 of the foregoing 
subsection to introduce the reader to the concepts of the joint frequency 
function and of the joint distribution function of two random variables. 
These concepts are then readily generalized for an arbitrary number of 
random variables considered jointly. 

Let X and Y denote two random variables defined over the same F.P.S. 
To have something concrete in mind, the reader may think of the random 
variables X and Y defined in Example 2 of subsection 4 • 1 - 2. Let t and t 
be any real numbers. Using the reasoning which proved Theorem 4-1, the 
reader will have no difficulty in establishing that, whatever t and t, the 
definition of X and Y determines the probability P((X = t){X = 7)] 
that the random variable X will equal t and simultaneously the random 



174 RANDOM VARIABLES [4*1*3] 

variable Y will equal r. In order to compute this probability it is sufficient 
to count the elements of the F.P.S. for which X — t and simultaneously 
Y = r and to divide the number of these elements, say nfC-Y == t)(Y = <)]; 
by the total number N of elements in the F.P.S. It follows that, whatever 
be the two random variables X and F, the probability P{ (X = t)(Y == r)} 
is well defined. Obviously, also, this probability is a function of t and r. 

Definition 4*5. The function of two real variables t and t defined for 
— 00 < ^ < + 00 and — 00 < r < CO astheprobability P{(X = t){Y === t)} 
thatf the random variable X will equal t and simultaneously^ the random 
variable Y will equal r is called the joint frequency function of X and Y. 

Table 4-3 


Joint Frequency Function of Random Variables X and Y 
Px.rit, r) = cic:c^,;^^ycio 


\ 

t 

^ \ 

0 

1 

2 

3 

4 

0 






1 


.1174 




2 



.1415 



3 



.0419 



4 



.0031 




The joint frequency function of X and F (or the frequency function of 
X and F for short) is denoted by the lower-case letter p with X, F as 
subscripts and with t and t in parentheses following the main symbol p. 
Thus, 

Px.r(<,r) = P{(X = 0(F=r)}. 


Turning to Example 2 of subsection 4* 1 • 2 it will be seen that px.ritj r) 
is equal to zero for all combinations of values of t and r except those when 
t = 0, 1, 2, 3, 4 and, at the same time, r = 0, 1, 2, 3, 4. If both t and r 
are non-negative integers less than five, then the reader will have no 
difficulty in checking that 


(4.M) 


Px.r(l, t) 


L4L4L12 

r® 

1^20 


Since the joint frequency function of the two random variables depends 
on two arguments t and t, there are obvious difficulties in representing 
this function graphically. The best we can do to obtain a quantitative 







JOINT DISTRIBUTION 


175 


[4-1-4] 

idea of the function r) is to construct a table of its non-zero values. 

Table 4-3 gives the values of the particular frequency function (4-1-1). 

Definition 4*6. The function of two real variables t and r defined for 
— 00 < < < +00 and — 00 < 7 . < +00 CLS the probability P{{X ^ t){Y ^ t)} 
that the random variable X mil not exceed t andy simultaneouslyy the random 
variable Y will not exceed t, is called the joint distribution function of the 
two random variables X and Y. 

The joint distribution function of two variables X and F is denoted by 
the capital letter F with subscripts X, F and with arguments ty r in paren- 

Table 4*4 

Joint Distribution Function of Random Variables X and Y 


t 

T 

0 

1 

2 

3 

4 

0 

■■ 


.073 



1 


.172 

.366 

.454 

.465 

2 

.073 

.366 


.833 

.847 

3 

.098 

.454 

.833 

.971 

.985 

4 


.465 

.847 

.985 

1.000 


theses immediately following the main symbol F. Thus Fx.rit, r) = 
P{(X^«)(F^r)}. 

Once the reader has mastered the connection between the frequency 
function and the distribution function of one variable, he will have no 
difficulty in understanding the samo relation in the case of two random 
variables. In particular, the reader will easily verify the figures in Table 4*4 
representing the distribution function of the variables X and F of Example 
2 of subsection 4 • 1 • 2. 

4* 1*4. General properties of distribution. The examples of subsection 
4* 1-2 bring out several important properties of frequency functions and 
the corresponding distribution functions. These general properties will now 
be recorded in the following theorems. 

Theorem 4‘3. A random variable cannot possess more different values 
than there are elements in the fundamental probability set The proof of 
Theorem 4-3 is obvious. An important consequence of this theorem is that, 
as long as we limit our considerations to finite fundamental probability 
sets, the random variables defined on these sets can possess only a finite 









176 


RANDOM VARIABLE 


[4-1-4] 

number of different values. In the theory of probability treated on a more 
advanced level, this restriction is not valid. Unless specifically mentioned, 
all the random variables treated in this book may possess only a finite 
set of different values. 

Theorem 4*4. If X is a random variable capable of assuming valties 

Ui < U 2 < • • • < Mn 

and no others, then the frequency function px{t) of the variable X is equal to 
zero for all values of t different from Ut , u^, ••• ,Un and 

Pxiih) = = u*} 

for fc = 1, 2, • • • , n. The proof of this theorem is obvious also. 

Theorem 4*6. If X is a random variable capable of assuming values 

Ui < U2 < • • • < Un 

and no others, then the sum of the values of the frequency function px{t) at 
the points Ui , U 2 , •••, Un is equal to unity, 

Px(Ui) + PxM + • • • + Px{Un) = 1 . 

Proof. Since the random variable X is a numerical function defined 
over the F.P.S. and since it does not assume any values other than Ui , 
^2 > • * • , Wn , it follows that the logical sum 


(X = tiO + (X = Ti,) + +(X = w„) 


is a sure property in the F.P.S. considered. Therefore 

P{(X = wO + (X = U 2 ) + ••• + (X = u„)} = 1. 

Since by definition X is a single-valued function defined over the F.P.S., 
it follows that (X = w.) and (X = u,) are exclusive properties of the 
elements of the F.P.S. for i 9 ^ j. In other words, if for a given element of 
the F.P.S. the function X has the value w, , it cannot have the value 
Ui 9 ^ Ui at the same time. In consequence, application of the addition 
theorem gives 

p{i: (X = = epix = uA = t,px{ud = 1 

U-i J i-1 <-x 


Q.E.D. 

Theorem 4-6. If a random variable X is capable of assuming values 


Ui < Ut < • • • < u„ < M*+i < ••' u„ 



MOST PROBABLE VALUE 


177 


[4-l*5] 

and no others, then the distribution function Fx(t) of the variable X is a non¬ 
decreasing step-function equal to zero for all values of t < Ui , equal to unity 
for all t ^ Un and 

Fxit) = JlpxiUi) 

for all valms of t between the limits Um ^ t < u^n+i , where 
m = 1, 2, • • • , n — 1. 

In order to prove Theorem 4-6, it is sufficient to repeat the reasoning we 
used in Example 1 of subsection 4* 1-2 in determining the probability 
distribution of the variable considered there. 

Theouem 4*7. If Fx{t) is the distribution function of a random variable 
Xy then the difference between the values of Fx{t) assumed at any two points 
ti < <2 equal to the probability that the value of X will be between the limits 
t\ ^ X ^ ^2 • 

Proof. The variable X can satisfy the condition X g <2 in two ex¬ 
clusive ways: either X g ^ or else < X ^ <2 • Consequently, 

Fx{t,) = P{X g Q = P{(X g U) + < X g t,)]. 

Because (X ^ ti) and (ti < X ^ < 2 ) are exclusive properties, the addition 
theorem gives 


Fx{t2) = P{X ^ U] + P{t, < X S t,}. 


But, by definition, 


P{X ^t,} = FxHi). 


Therefore, 


Fx{t2) = Fx{Q + P{h <X ^t,} 


or 


P{t, < X ^ ^ 2 } = Fxik) ~ FxiU)- 


Q.E.D. 

4 • 1 • 5, Concept of the most probable value. As mentioned, the values of 
the frequency function of a random variable X answer the question: 
How frequently does X assume each of its possible values? In many studies 
of random variables, the question arises: which of its possible values does 
X assume more frequently than the others? In this connection we have 
the following definition. 



178 RANDOM VARIABLE [4*1'5] 

DEnNiTiON 4'7. If , U 3 , Uz , • • • , m* , • • • , m, are all the different 
possible values of a random variable X and if u* has the property that, whatever 



0 1 2 3 4 5 6 7 

t 

Figure 16. Frequency Function pv{t). 


any other possible value w, of X, the probability P{X = Uk\ is at hast as 
grtat as the probability P{X = 



[4-2-1] 


DEFINITION 


179 


PIX = Uk] ^ P[X = Ui} for z = 1, 2, • • • , n, 
then Uk is called the most probable value of X, 

It follows from this definition that a random variable with a finite 
number of possible values must have one, but may have two or more 
‘‘most probable values.” In fact, if X assumes all of its values equally 
frequently, as was the case in Example 1 of subsection 4-1-2, then all of 
them satisfy the definition of the most probable value. However, as the 
student will surmise, the concept of most probable value was not intro¬ 
duced with the idea of applying it to frequency functions of this kind. 

In Example 2 of subsection 4 • 1 • 2, there is just one most probable value 
of X, This value is 2. Further possibilities are illustrated in Figures 15, 16, 



and 17, which give the graphs of three commonly encountered types of fre¬ 
quency functions. It will be seen that variables u and w have two most 
probable values, zero and one and zero and seven, respectively. The 
random variable v has just one most probable value, zero. If there are 
several most probable values, it is customary to label them in order from left 
to right, the first most probable value, the second, etc,, • • • , the last. 

4*2. Binomial variable 

4-2-1. Definition. Consider a series of n trials, say Ti , f ,Tn^ 
Assume that each trial is capable of producing one of two possible out¬ 
comes, either E, which we shall describe as “success,” or E, which we 
shall describe as “failure.” Consider the fundamental probability set S 
made up of the possible outcomes of the n trials Ti , r 2 , • • • , r„ . In 



180 BINOMIAL VARIABLE [4*2*2] 

other words, each element of S is a series of n outcomes of the particular 
trials and can be represented by a series of n letters, e.g., 

• • • -En • 

This particular series of letters denotes, then, a possible outcome of the 
series of trials. In this series success occurred in the first and in the last 
trial but not in the second nor in the third. It will be assumed about the 
fundamental probability set S: (i) That the probability of success in each 
particular trial of the series is the same. This probability will be denoted 
by p and its complement by g = 1 — p. (ii) That the occurrence of success 
in any particular trial T, is completely independent of that in all other 
trials of the series. 

It will be convenient to abbreviate the statement of hypothesis (ii) and 
say, simply, that ^^the n trials Ti, Tg, * • • , Tn are completely independent.^' 

With assumptions (i) and (ii), let X stand for the number of successes 
in the course of n trials Ti , Tg, • • • , !r„ . Obviously, Z is a single-valued 
function defined over the fundamental probability set S and, therefore, a 
random variable. For reasons which will be apparent later it is called a 
‘‘binomial random variable" or “a variable following the binomial fre¬ 
quency law." 

Definition 4*8. The random variable X, representing the number of 
successes in n completely independent trials in which the probability of success 
in each particular trial is constanty is called a binomial variable, 

4*2-2. Frequency function of the binomial variable. It follows from the 
definition of the binomial variable that its possible values are the non¬ 
negative integers 0, 1, 2, • • • , n. Let k stand for any one of these integers. 

Theorem 4-8. For each value of /b = 0, 1, 2, • • • , n the frequency func¬ 
tion px{k) of the binomial variable X is equal to the coefficient of w* in Newton*s 
expansion of the binomial {q + puYy so that 

iq + jm)" = ?“ + ng""’pM + q”'yu‘‘ 

+ • • • + cig’-yw* + ... + pV 

= Px(0) + Px(l)u + Px(2)u* 

+ • • • + Px(k)u'‘ + • • • + Px(n)M"- 

The reference to Newton's expansion of the binomial (q + puY contained 
in this theorem explains why the random variable under consideration is 
called the “binomial variable." 

, Proof. The frequency function px{k) is equal to the probability 
P[X = k] that in the course of n completely independent trials with a 



[4*2*2] FREQUENCY FUNCTION 181 

constant probability of success equal to p, success will occur exactly k 
times. Since the order in which the k successes occur is immaterial, px(&) is 
the probability that the outcome of the n trials will be 

E 1 E 2 • * • EkEk+iEk-k ^2 •••■£» 

or that it will be 

E 1 E 2 EZ • • • EkEk+lEk+2Ek+3 • • • En 

or any other series in which success occurs exactly k times in any order. 
Since the trials are assumed to be completely independent and since the 
probability of E in any one trial is equal to p while that of E is equal to 
^ = 1 — p, an application of the multiplication theorem gives 

P[E,E2 • • • EkEk.^Ek.2 • • • In} = , 

P{E,E2 • • • EkEk.iEk,2 • • • SnI = 

Indeed, in whatever order we write the k letters E combined with the 
n — k letters £, the probability of the outcome of the n trials so designated 
is equal to the product of n factors, k of these factors equal p and the 
remaining n — k equal q, so that the final result is equal to p^q^~^^ 

It follows that the probability that X will be equal to k is equal to the 
product of the expression p^g”~* times the number of different ways in 
which it is possible to select k trials out of the series Ti , T 2 y , T„ . 
Obviously, this number is equal to Cj. Thus, whatever be ik == 0,1, 2, • • • n, 

px(fc) = P{X = fc} = 

Q.E.D. 

Because of the foremost importance of the binomial variable, another 
proof of the above theorem will be given in the second volume of this text. 

Example 1. In Chapter 1 we considered the problem of Chevalier de 
M6r6. In one of its phases, this problem (subsection 1 • 3 • 3) was concerned 
with the number of appearances of ‘^double six'' in the course of n = 72 
throws of a pair of dice. In treating this problem it was tacitly assumed 
that (a) the 72 throws are completely independent, and. (b) the probability 
of obtaining a ^'double six" in each particular throw is the same, equal 
to p. It follows that the number Y of ^'double six" in the course of 72 
throws is a binomial variable which has its frequency function determined 
by the expansion of the binomial {q + pw)^*, when g = 1 — p. Again, 
the problem treated in subsection 2-4-2 was concerned with the random 
variable X, which is the number of games in which double six will occur 
at least once. The total number of games was n = 3 and each game con¬ 
sisted of 24 completely independent throws of two dice. It was assumed 
that the probability of at least one ‘double six" is the same P in each 
game. 

It follows that X is a binomial variable with frequency function gene- 



182 BINOMIAL VARIABLE [4*2*3] 

rated by Newton’s expansion of the binomial (Q + Pm)*, where Q = 1 — P. 
This frequency function was computed in subsection 2-4*2 by the direct 
method. 

Examjile 2. Let X denote the number of recessives among the n off¬ 
spring of a cross between two hybrids g6 X gG. The Mendelian Theory 
implies that the probability that an offspring of two hybrids will be a re¬ 
cessive is equal to .25 and that the genetic composition of one offspring 
is completely independent of the composition of the others. Therefore, 
within the framework of the Mendelian theoiy, the variable X is a bi¬ 
nomial variable with frequency function generated by the expansion of 
the binomial (.75 -f- .25u)'‘, so that 

Px(A) = P{X = fc) = 

4.2*3. General properties of the binomial distribution. Since the bi¬ 
nomial distribution plays an exceedingly important role in statistics, we 
will study its general properties in some detail. The properties studied 
relate to what may be roughly described as the “shape of the frequency 
function” and, in particular, to the most probable values of the variable. 

Let the letters n, p, and g •= 1 — p have the same meaning as in sub¬ 
section 4•2*2. In order to avoid a discussion of trivial cases, it will be 
assumed that neither p nor q is equal to zero and that the integer n > 1. 

Theorem 4*9. The sequence 

(4-2-1) Pz(0), pz(l), • • • , px(k), • • • , pz(n) 

of non-zero values of the frequency function of a binomial variable X is of one 
of the foUomng three types. 

(i) J-shaped decreasing. Zero is a most probable value of X, with the 
possibility Hwt unity also is a most probable value, so that 

Pz(0) ^ Pz(l). 

Of the remaining terms of the sequence (4*2* 1), each is greater than its suc¬ 
cessor 

Pz(l) > Pz (2) > • • • > Pz(»), 

so that the least probable valve of X is n. 

(ii) J-shaped increasing, n is a most probable value of X with the poss^ 
bility that n — 1 also is a most probable value, so that 

Pz(n - 1) ^ Pxin). 

Of the remaining members of the sequence (4-2* 1) ectch is less than its successor 
Pz(0) < pz(l) < • • • < Pz(m - 2) < pz(n - 1) 
so OuU Oie lea^ probable value of X is zero. 



183 


[4*2*3] FREQUENCY FUNCTION 

(iii) Unimodah Neither zero nor n is a most probable value of X. The 
members of the sequence (4•2*1) increase and reach a maximum at one or^ 
possibly f two most probable values of X, so that 

Px{0) < p;r(l) < • • • < pxiK - 1) ^ Px{h)- 

Thereafter the members of the sequence (4•2-1) decrease steadily^ 

Pxih) > Pxih + !)>•••> Px{n). 

The least probable value of X is either zero or n or both. 

It will be noticed that Theorem 4*9 implies that the number of most 
probable values of the binomial variable is either one or two. Also, if 
there are two most probable values, then they differ by unity. It will be 



Figure 18(i). Ty|3es of Binomial Distribution, n = 9, p = .1, (n 4* l)p => 1. 

convenient to use the letter ko to denote either the unique most probable 
value of X or, if there are two, the greater of them. With this convention 
ko may be described as the last most probable value of X. 

The three types of binomial distribution are illustrated in Figures 18(i), 
18(ii), and 18(iii). 

Proof. To prove Theorem 4-9 we will use a device which appears 



184 


BINOMIAL VARIABLE 


[4-2-3] 

useful in studies of the frequency functions of many other variables, in 
addition to the binomial variable. The device consists of computing the 
sequence of quotients 




.. px(fe) ._ 

PxQi - 1 ) 


for A: = 1, 2, • • • , n. 


If the expression for the quotient Q* is not complicated, then it may be 
easy to judge whether the quotient is less than, equal to, or greater than 
unity. The answers to these questions amount to stating that the prob- 



Figure Types of Binomial Distribution, n 9, p .9, (n + l)p « 9. 

ability Px(fc) is less than the preceding probability px(h — 1), or that 
the two are equal or that the probability p;c(fc) is greater than its prede¬ 
cessor — !)• 

In the case of the binomial variable, the ratio Qk has a veiy simple form 
which the student will have no difficulty in computing 


185 


[4-2’3] FREQUENCY FUNCTION 

It is seen that Q* is a decreasing function of k. Thus 
(4-2-3) Qi > Q 2 > ••• > Qn. 

The assertions of Theorem 4-9 are simple consequences of formula 
(4•2-2) and the inequalities (4•2*3). The three types of binomial distri¬ 
bution described in Theorem 4*9 correspond to the three possible relation¬ 
ships between the ratios Q* and unity. 



(1) Unity is greater than or equal to Qi , and thus all the other Q^s 
are less than unity, 

1 ^ Ql > Q 2 > •• > Qn-l > Qn . 

(2) is at least equal to unity. In consequence, all the other (J's are 
greater than unity, 

Ql > Q 2 > • • > On-I > Q« ^ 1. 

(3) Unity is less than Qi but greater than Q„ . Then there must exist 
two successive ratios Q*. and Q*,+i which bracket unity so that 

0 *. ^ 1 > 0 * 0+1 • 

All the preceding Q^s are necessarily greater than unity and all those 
following 0fc,+i are necessarily less than unity. 



186 


BINOMIAL VARIABLE 


[4-2'3] 

In order to prove Theorem 4-9, it is sufficient to interpret the cases 
(1), (2), and (3) in terms of the sequence (4-2-1) of non-zero values of 
the frequency function px(k). Recalling the definition of Q* , we see that 
in case (1) the first element in the sequence of the px(A;)’s cannot be less 
than the second 


Px(0) ^ px(l), 

and that of the following elements each is greater than its successor, so that 
Px(l) > Px(2) > • • • > px(n). 

It follows that zero is definitely a most probable value of X with the 
possibility that unity is another most probable value. For both imity and 
zero to be most probable values, it is necessary and sufficient that Qi = 1 . 
Thus, in case (1) the binomial distribution is J-shaped decreasing. 

Case (2) is analogous but reversed. Since Q* > 1 for all values of k 
with the possible exception of Q, ^ 1, each probability px(fc) is greater 
than the preceding probability px(k — 1) for A: = 1, 2, • • • n — 1. If 
Q„ > 1, then px(n — 1) < px(n) and there is just one most probable 
value of X, namely ko = n. If Q, = 1, then px(n — 1) = Px(n) and there 
are two most probable values, namely ko = n and ko — 1 = n — 1. The 
binomial distribution is J-shaped increasing. In order that there be two 
most probable values of X, it is necessary and sufficient that Q, = 1. 

In case (3), Qi is greater than unity. It is possible also that several of 
the Q’s following Qt are also greater than unity. It follows that px(0) is 
less than px(l) and that, of the probabilities following Px(l), there may 
be several greater than their predecessors. This increasing sequence stops 
at Px(ko), with the possibility that px(ko — 1) is equal to pxiko). For this 
to occur it is necessary and sufficient that Q*. = 1. Since Qt.+i is less 
than unity, so are all the remaining Q’s up to Q„ . It follows that of the 
members of (4 • 2 • 1) which succeed px (ko) , each is greater than its successor. 
Thus, in case (3) the binomial distribution is unimodal. This completes 
the proof of Theorem 4 • 9. 

It will be noticed that the proof of Theorem 4-9 implies that for the 
existence of two most probable values of X, it is necessary and sufficient 
that one of the ratios Q* be equal to unity. 

Theorem 4 • 10. The last most profile value ko of a binomial variable is 
equal to the greatest integer which does not exceed the product (n -f- l)p. For 
the existence of two most probable values of X it is necessary and sufficient 
that (n 4- l)p is an integer. Then both ko = (» 4- l)p and fco — 1 = 
(n 4“ l)p — 1 are most probable values of X. 

In order to prove Theorem 4* 10 we will show that in each of the three 
cases (1), (2), (3) distinguished in Theorem 4-9, the last most probable 
value ko<^ X satisfies the double condition 



187 


[4‘2‘3] FREQUENCY FUNCTION 

(4'2-4) (n + l)p — 1 < *0 g (n + l)p. 

We shall show also that for one of the quotients Q* to equal unity, it is 
necessary and sufficient that the product (n + l)p be an integer. 

In case (1) the last most probable value of X is either unity or zero 
according to whether or not Q, is equal to unity. Case (1) is characterized 
by the condition Qi ^ 1. We write this condition, using (4•2-2), 


Q. 


and it follows that in case (1) 




or 


np ^ np ^ 1 — p 

(n + l)p ^ 1. 


Therefore the extreme right member of formula (4•2-4) is at most equal 
to unity. 

Since both (n + 1) and p are positive numbers, their product also must 
be positive. Therefore the extreme left member of formula (4*2*4) is 
greater than minus unity and is equal to zero if (n + l)p = 1. It follows 
that, if (n + l)p < 1, the only integer satisfying (4*2*4) is ko = 0. On 
the other hand, if (n + l)p = 1, the only integer satisfying (4*2*4) is 
ko = 1. Since (n + l)p = 1 only when Qi = 1, it follows that in case (1) 
the last most probable value of the binomial variable X is determined by 
the double formula (4-2*4). 

In case (2) the last most probable value of X is A^o = n. Case (2) is 
characterized by the condition Q„ S 1. Substituting the value of as 
determined by (4-2-2) we rewrite this condition as 


or 


and, finally, as 


12 

n q 


^ h 


p ^ n(l - p), 


(n + l)p ^ n. 


Returning to formula (4-2-4), we see that in case (2) its extreme right 
member is at least equal to n. As to the extreme left member of the same 
formula, we notice that (n + l)p must be less than n + 1, because p < 1. 
Therefore, (n •+■ l)p — 1 is less than n. Thus formula (4-2*4) asserts that 
ko is between two numbers (n + l)p — 1 and {n + l)p which differ 
by unity with the first number less than n and the second at least equal 
to n. Therefore the only integer satisfying (4-2-4) is ko = n. This proves 



BINOMIAL VARIABLE 


188 


[4-2-3] 


that in case (2) the last most probable value of X is determined by (4"2-4). 
In case (3) the last most probable value K of X is such that 


(4-2-5) 


^ 1 > Qk, + 1 . 


It is easily seen that this condition is equivalent to (4<2'4). Using the 
definition of Qu, we rewrite the condition ^ 1 as 

n - ko+ 1 2 > 1 


or 


or, finally, 


(n — fco + l)p ^ fco(l - p), 


fco ^ (n + l)p- 


This establishes the right-hand inequality in formula (4-2*4). To establish 
the other, we use the definition of Qt and rewrite the condition 1 > Qt.+i . 
We have 


and 


and, finally. 


n — kp 2 
Ao + 1 


(ko 4- 1)(1 - p) > (« - ko)p, 

(n + l)p — 1 < ko 


which completes the proof of formula (4-2'4). 

Remarks. It will be noticed that the method used to prove formula 
(4'2-4) in case (3) is not applicable in case (2) because in this case ko = n 
and the discussion of formula (4-2-5) would require the consideration of 
the ratio Q„+i . This is impossible because the definition of Qn applies only 
to X; = 1, 2, • • • , n. Similarly the method used in case (3) cannot be applied 
in case (1) because in this case ko may be equal to zero and Qo has no 
meaning. 

To complete the proof of Theorem 4-10 it is necessary to show that 
cases where the binomial variable has two most probable values fco — 1 
and ko coincide with those where the product (n + l)p is an integer. While 
proving Theorem 4-9 it was established that, for the existence of two most 
probable values of X, it is necessary and sufficient that one of the ratios, 
^•S' Q*. t is equal to unity. This means that 


n — to 4- 1 2 

ko q 


1 . 



189 


[4'2*3] PROBLEMS AND EXERCISES 

But this equality is equivalent to 

(n “ fco + l)p = * 0(1 - p), 

or, finally, to 


(n + l)p = ko . 

Since fco is necessarily an integer, it follows that {n + l)p is an integer. 
Thus the condition that (n + l)p is an integer is equivalent to the condi¬ 
tion that Qjfc, = 1. This proves Theorem 4-10. 


PROBLEMS AND EXERCISES 


In Problems 1 through 5, X is a binomial variable, and its frequency 
function is determined by the expansion {q + pu)®. 

PdkjL 


1. How many quotients Q,, = 


are defined for the variable X? 


Px(k - 1) 

Substitute p = .2 and compute these quotients. What is the first most 
probable value of X? 

Answer: (a) Five; (b) (c) 1. 


2. Determine the largest value po of p with which the first most probable 
value of X is zero. Compute the frequency function of X corresponding 
to p = Po. How many most probable values are there? 

Answer: (a) (b) .4018, .4018, .1608, .0322, .0032, .0001; (c) two. 

3. Determine the range of values of p for which the first most probable 
value of X is equal to unity. Denote by pi the midvalue of this range and 
compute the frequency function of X corresponding to p = pi . 

Partial answer: (a) J < p ^ 

4. Determine the range of values of p for which the last most probable 
value of X is /jo = 3. Are there any values of p within this range such that 
there are two most probable values of X? Compute the corresponding 
values of the frequency function of X. 

5. Determine the values of p for which the first most probable value 
of X is 4. Is there a smallest of these p's? Is there a largest of these p's? 
Compute the frequency function of X corresponding to p = |. 

6. Denote by Y the random variable which equals the number of suc¬ 
cesses in the course of n completely independent trials for which the 
probability of success is always equal to p = .2. What is the customary 
term used to describe the variable F? What is the value of n if it is known 
that the first most probable value of Y is unity? 

Partial answer: (b) n is one of 5, 6, 7, 8 or 9. 



190 


BINOMIAL VARIABLE 


[ 4 - 2 - 4 ] 

7. Consider the variable Y in Exercise 6, and determine the smallest 
value of n for which the last most probable value of Y is equal to 2. Com¬ 
pute the corresponding frequency function. 

8. Four different experiments of crossing two hybrids gG X gO were 
performed, yielding the following numbers of offspring: n, = 5, nj = 10, 
TCs = 20, and = 40. For each of these experiments, compute the most 
probable number of pure recessives among the offspring obtained. Also 
compute the probabilities that the actual number of recessive offspring 
will be exactly equal to its most probable value. Is it necessarily very 
probable that a random variable will assume its most probable value? 

4•2'4. Elimination mating. We will use the term elimination mating to 
describe a method which is sometimes used to improve a local race of 
cattle. Let R. stand for the local race “to be eliminated” and Ri for the 
preferred race “to be introduced.” All the males are removed from race 
R, and the females are crossed with the imported males of race Ri . Out 
of the first generation bom from this cross 


R.XRi = Fi, 

all the males are removed and the females crossed (back-crossed) with the 
males Rt . In this way the second generation is produced 

F\ X Ri = Fi, 

etc. It is intuitively evident that when the elimination mating has been 
repeated many times, the genes characterizing the race R. will gradually 
be replaced by those appropriate to race Ri . The purpose of this subsection 
is to consider in detail a simplified version of elimination mating and to 
investigate the speed with which the desired effects may be attained. It is 
obvious that in treating this problem we need consider only the pairs of 
genes for which there is a difference between R. and Ri . The simplification 
consists in assuming that both races R, and Ri are homozygous (i.e., 
either pure recessives or pure dominants) with respect to all the pairs of 
genes in which they differ and that all these pairs of genes are nonlinked. 

Let n be the number of pairs of genes with respect to which the genetical 
composition of the individuals belonging to R, differs from those belon ging 
to Ri . The genes appropriate to race R. will be denoted by lower-case 
letters and those appropriate to race Ri by capital letters, irrespective of 
whether the genes are dominant or recessive. Thus the genetical compo¬ 
sition of the individuals will be written as 

Rt • gigi t g2g2 > • • * > > • • * > Qngn > 

Ri : GiGi, GaGa , • • • , (?*(?*, • • • , GnG„ . 



[4‘2’4] ELIMINATION MATING 191 

The genetical composition of an individual belonging to Fi (the Fi 
individual, for short) is unambiguously determined 

F 1 : giGi , ( 72 ^ 2 , • • • , QkGk , • • * , QnGn 

and may be characterized by the statement that all Fi individuals possess 
exactly n genes to be eliminated, namely gri , ^2 , • * • , On • The genetica) 
composition of F 2 individuals is uncertain. In fact, when crossing 

F^X Ri F 2 , 

that is, 

9 Q 2 G 2 > • * • ) gnGf) X {GiGi , G 2 G 2 , * * • , GnGr)j 

the only certainty is that the progeny F 2 will inherit from its father n 
genes to be introduced. As to inheritance from the mother, by good luck, 
all the genes may be of the type to be introduced or, by bad luck, all of 
these genes may be of the type to be eliminated and all the intermediate 
cases are possible, also. Let X 2 stand for the number of genes of the type 
to be eliminated which are present in an Fg individual. Postulating the 
probabilistic axioms of Mendelian heredity of Chapter 3, we find that 
X 2 is a random variable, capable of assuming values 0,1, 2, • • • , n. Assume 
now that elimination mating is repeated s + 1 times and denote by F,+i 
an individual of the (s + l)th generation born 

F. X Ri = F.^i . 

Further, let stand for the number of genes of the type to be eliminated 
which are present in an individual belonging to F,+i . is again a 
random variable capable of assuming values 0, 1, 2, • • • , n. The primary 
purpose of this subsection is to investigate the probability law of X,+i . 

Consider a specified gene of the type to be eliminated, say (7* , and 
denote by r^+i{k) the probability that the F,+i individual will possess 
Qk . It is obvious that r^Qz) = 1 and r 2 {k) = irrespective of the value 
of fc. In order to determine r^^i{k) for an arbitrary value of s, notice that, 
in order that F,+i possess gene Qk it is necessary and sufficient that (i) 
the mother F. of F,+i possess gene Qk [the probability of which is r,(k)] 
and (ii) that F,+i inherits Qk from its mother (the probability of which is 
equal to |). It follows that r,+i(^) is the probability of a logical product 
and 

(4-2-6) r.^xik) = r.(fc)i 

for every s. Writing formula (4«2-6) for s = 1, 2, • • • 

ri{k) = 1, 


r2(k) = ri(A;)i, 


9 



192 


BINOMIAL VARIABLE 


[ 4 - 2 * 4 ] 


TiCk) = rx(k)i, 


r,(k) = r..i(k)i, 


r.+i(k) = r.(k)i, 

multiplying and canceling, we obtain 

r...(k) = G)*. 

The important property of this formula is that r.+i{k) is independent 
of k. It follows that k may be abandoned in our notation. The result 
obtained may be formulated as follows: Whichever gene gk we consider 
of the type to be eliminated, the probability that an F.+i individual will 
possess this gene is the same for &\\k = 1, 2, * ■ ■ , n, namely, 

r... = G)*. 

In order to write the frequency function of the random variable X,+i , 
we use the assumption that all the n pairs of genes considered are non- 
linked and that, therefore, the inheritance by F,+i of any one of them is 
completely independent of all the others. Consequently, considering the 
inheritance of jr* by F,+i as a “success” in a complicated trial (consisting 
of Fi inheriting from Fi ; Ft , inheriting jr* from F* , • • • , etc.), we 
see that X.+i is the number of successes in n completely independent 
trials in which the probability of success in any one trial is constant and 
equal to r.+i . It follows that X,+i is a binomial variable with frequency 
function 

Px ...(0 = 

(4-2.7) 

= mr[i - («T‘ 

for t = 0, 1, 2, • • • , n. 

Formula (4-2-7) can be used to characterize the speed with which the 
repeated application of elimination mating will lead to the substitution of 
race for race R,. This may be done in various ways. One of them is by 
computing the probability that an F,+j individual will be a pure repre¬ 
sentative of race Rt . 

For an individual of the (s -h l)st generation to be a pure representative 
of Rt , the random variable X,^.l must be equal to zero. The probability 
of this is 

P{X.,, = 0} = px...(0) = [1 - (J)*]". 

★It is easy to see that, whatever number n of pairs of genes by which 
Ri differa from R, , the probability P{X,+, = 0} tends to unity as s is 



ELIMINATION MATING 


193 


[4'2-4] 

increased. To prove this, it is sufficient to show that, whatever positive 
number e > 0, the probability P|.y.+i = 0} will be greater than 1 — 
provided s exceeds a certain limit, say s(€, n), which depends on both 
« and n. To find this limit s(«, n) we simply solve the inequality 

= 01 = [1 - (i)T > 1 - 

Easy algebra gives in turn 

1 - (j)* > 

1 _ > (1)*, 


2 ‘ > 


1 - - 


and, finally 


-log [l - \/l - t] 
log 2 


It follows that, for the probability P{X,+i = 0} > 1 — «it is necessary 


Table 4*5 

Values of P{X,+i = 01 Corresponding to Different Values of n 


Probability that a representative of P.+i is a pure P, 


n 

s + l\' 

2 

5 

10 

20 

40 

3 

.5625 

.2373 

.0563 

.0060 

.0000 

4 

.7656 

.5129 

.2631 

.0692 

.0048 

5 

.8789 

.7242 

.5245 

.2751 

.0757 

6 

.9385 

.8532 

.7280 

.5299 

.2808 

7 

.9690 

.9243 

.8543 

.7298 

.5326 

8 

.9844 

.9615 

.9246 

.8548 

.7307 

9 

.9922 

.9806 

.9616 

.9247 

.8551 

10 

.9961 

.9903 

.9806 

.9617 

.9248 

11 

.9980 

.9951 

.9903 

.9806 

.9617 


and sufficient that s exceeds the number 


s(«, n) 


(4-2'8) 


log (1 - - «) ^ 

lc« 2 



194 BINOMIAL VARIABLE 

Since P{X,+i s= 0} < 1, it follows that 


[4*2-4] 


limPjX.^, = 0} = 1. 

Formula (4•2-8) shows that the larger the number n of pairs of genes by 
which Ri differs from R, , the slower the process of elim ination of R, . In 
fact, as n increases, a(«, n) increases because — « approaches unity 
and thus the numerator - log [l - - «] tends to infinity. Thus, the 

larger n, the more repetitions of elimination mating are necessary to 
achieve the same level of the probability that an F.+i individual will 
be a pure representative of the race to be introduced. 

This general discussion is illustrated by the following table of values 
of P{X ,+i = 0} corresponding to several values of n. 

Roughly speaking, the figures in the body of Table 4-5 may be in¬ 
terpreted as the proportion of P.+j individuals which are pure representa¬ 
tives of the race to be introduced. Thus the number .5625 in the first 
column means that, when P, differs from R, by only two pairs of genes, 
then among the individuals of the third generation there will be 56.25 
percent of pure representatives of P. . It is seen that the proportion of 
pure representatives of P< among the P.+, individuals decreases as n is 
increased. 


PROBLEMS AND EXERCISES 

The following problems relate to elimination mating between homo¬ 
zygous races P, and P< which differ with respect to n = 6 pairs of genes. 
:X',+, stands for the number of genes of race P. present in an individual 
of the (s + l)st generation. ' 

1. Compute and plot the frequency functions of , X 3 and X4 
What should the value of X 3 be in order that an F 3 individual is a pure 
representative of the race to be introduced? How frequently is an indi¬ 
vidual of the third generation a pure representative of the race to be 
introduced? How frequently does an individual of P* possess no more 
than one gene of the race to be eliminated? 

PaHial answer, (b) zero; (c) .178; (d). 833. 

2. Write down the formula giving the probability, say P(s -j- 1), that 
an individual of the (s -f l)th generation will be a pure representative of 
the race to be introduced. What is the smallest value of s for which 
P(s + 1) ^ i? What is the smallest value of s for which P(s -f- 1) S .9? 

Answer: (a) [1 - (|)*]*; (b) 4; (c) 6. 

3. Determine the most probable number of genes which remain to be 
eliminated in an individual in the fifth generation. Answer: zero. 

4. How many times would one repeat the elimination Tnntm g in order 
that the most probable number of genes to be eliminated in P.+, is unity? 



PROBLEMS AND EXERCISES 


195 


[ 4 - 3 - 1 ] 

What should 5 be so that the most probable number of genes to be elimi¬ 
nated in F.+i is zero? An^er: (a) 3; (b) 3. 

5. After repeating elimination mating s + 1 times a total of N indi¬ 
viduals of were obtained. It is given that no two of these individuals 
have any ancestors in common, except perhaps for Fx and the race i2, . 
Let y,+i denote the number of pure representatives of the race among 
the N individuals of . What is the nature of the variable F,+i? Write 
down its frequency function. Substitute AT = 6, and compute and plot 
the frequency functions of Fa , F., , and F4 . Compare these frequency 
functions with those of , X3 , and X4 of Problem 1. 

6. In the general conditions of Problem 5, what is the probability that 

among X = 6 individuals of , at least four are pure representatives of 
the race Ri ? Answer: .444. 

7. In the general conditions of Problem 5, what is the probability that 
none of the AT = 6 individuals of F4 are pure representative of the race 72, ? 

Answer: .00817. 

8. How large should s be so that the most probable number of pure 
representatives of 72, among AT = 10 individuals of F, + i is fco = 9? 

Answer: 5. 

9. How many individuals N of the F4 generation should one produce in 

order that the most probable number of pure representatives of 72, among 
them is five? Answer: 11. 

10. How many times should one repeat elimination mating in order that 

the most probable number of pure representatives of 72, among W = 6 
individuals of F.+i is equal to six? Answer: 7. 

4’3. Weighted binomial variable 

4*3-l. Frequency function of the weighted binomial. A series of n com¬ 
pletely independent trials, each with the same probability of success, is 
to be performed in conditions which are not completely determined. 
These conditions may be either Cx , or C 2 or, • • • etc., or C, . It is given 
that, should the conditions Cu prevail during the performance of the series 
of n trials, then the probability of success in each trial is p*. The conditions 
prevailing during the series of trials are random, and a* represents the 
probability that the conditions will be C* . The sum of the probabilities 
a* is equal to unity, 

«! + 0^2 + • • • + a. = 1. 

In these circumstances the variable X representing the number of suc¬ 
cesses in the n trials is called the weighted binomial variable. The reason 
for this terminology will be apparent from the formula for the frequency 
function of X. 

Let Pa'( 0 denote the frequency function of X. It is obvious that X is 



196 


WEIGHTED BINOMIAL 


[4'3*2] 

capable of assuming values 0,1,2, Let t be one of them. To compute 

Px{t) — P{X = tj, notice that the variable X can assume the value t in 
one of 8 mutually exclusive ways, namely 

p,(0 = P{X =t} = F{C.(X = 0 + C,(X = i) + ... + C*(Z = t) 


By the addition theorem 


+ ... +C'.(Z = <)}• 


Px(t) = i:P{C,(X = <)}. 

ib-1 


In order to compute the probability of the logical product Ck{X = t), 
notice that the probability of the condition C* is a* and that, given C* , 
the variable X satisfies the definition of a binomial variable, so that 

P{X = t\ C,} = - P*r‘. 


Thus, 

and 


= <)} = - p,r‘ 


Px(t) = 


which is the final formula for the frequency function sought. It is seen 
that it is equal to the sum of as many terms as there are different sets of 
conditions in which the series of n trials may be performed and that each 
term is a product of the probability a* by the corresponding binomial 
probability Clplt(l — Since the numbers at add up to unity, the 

process of computing pxH) is equivalent to averaging s different binomial 
probabilities with weights equal to a* . Hence the expression “weighted 
binomial variable” used to describe the random variable X. 


4*3*2. Problem of diagnosis. The statistical problem of diagnosis has 
many different forms and is discussed in this book several times. Although 
the term “diagnosis” is taken from medical practice, the problem of 
diagnosis is encountered in a number of fields of study in addition to 
medicine. In the present subsection we will treat an interesting aspect of 
the problem connected with the recent discovery* that X-ray technique 
is far from absolutely reliable in determining the presence or absence of 
tuberculosis. 

*The discovery is due to cooperation between five noted authorities in radiology, 
Drs. C. C. Birkelo, W. E. Chamberlain, P. S. Phelps, P. E. Schools, D. Zacks, and a 
statistician, Professor J. Yerushalmy. In this connection see the following publications: 
(1) Joum, Am. Med, Aasn., Vol. 133, pp. 359-365 (1947), (2) PiMic Health Reports, 
Vol. 62, No. 40, pp. 1432-1449 and 1449-1456. 



PROBLEM OF DIAGNOSIS 


197 


[4-3*2] 

The studies made thus far suggest that the application of X-ray tech¬ 
nique is comparable to the following mathematical scheme. The human 
population can be roughly divided into three categories: (i) those having 
no trace of tuberculosis in their lungs, which we shall describe briefly as 
'^healthy'^; (ii) those with moderate signs of tuberculosis in their lungs, 
or ^'moderately affected’^; and (iii) those "heavily affected.^' Let ai , 0^2 , 
Qf 3 , respectively, be the proportions of individuals in these three categories. 

Whether the individual is healthy or not, the outcome of his X-ray 
examination for tuberculosis is uncertain. Moreover, for each category we 
postulate the existence of a probability that the outcome of the test will 
be positive. These probabilities for the three categories enumerated will 
be denoted by Pi , P 2 , and pa , respectively. Naturally, the probabilities 
Pi f V 2 , Pa depend not only on the individuals examined but also on the 
care in preparing the X-ray photographs and on the competence of the 
radiologist who examines the photographs. 

In practical terms, the hypotheses concerning Pi , P 2 , and pa amount 
to the following. Imagine that a healthy individual is subjected to a number 
of independent X-ray examinations by the same X-ray technique and by 
the same radiologist. To insure independence it would be necessary to 
adjust the X-ray machine anew before each photograph and to take care 
that the radiologist is not aware that the many photographs are of the 
same individual. 

In these conditions, the long-run relative frequency of positive verdicts 
by the radiologist is pi , postulated to be the same for all healthy indi¬ 
viduals. Similarly, pz and pa represent the long-run proportions of positive 
outcomes of many independent X-ray examinations of a moderately 
affected and a heavily affected individual, respectively. 

It is obvious that the above mathematical scheme is a simplification 
of actual phenomena. In particular, it is very probable that among the 
"moderately affected” there is a range in the intensity of tuberculosis so 
that the postulated common probability pa is really an average. 

Returning to the mathematical model, consider a random variable X, 
defined as the number of positive outcomes in the course of n independent 
X-ray examinations performed on an individual selected at random from 
the population. It is obvious that X is a weighted binomial variable and 
that its frequency function is given by 

(4*3 •!) Px(0 ~ OliC„Pi(l — Pi) + « 2 CnP 2 (l P 2 ) 

+ OtgCnPaCl — Par"*. 

PROBLEMS AND EXERCISES 

The study by Drs. Birkelo, Chamberlain, Phelps, Schools, Zacks, and 
Professor Yerushalmy includes five independent readings of each of 1256 



198 WEIGHTED BINOMIAL [4*3‘2] 

X-ray photographs made by the standard technique. The 1256 individuals 
photographed were the employees of an institution who consented to co¬ 
operate in the study. The following table gives the distribution of the 
photographs according to the number of positive readings. 

Table 4-6 

Distribution of Positive Ouicomes in Five Independent Readings 
of 1256 X-ray Photographs 


No. of Positive 
Readings 
t 

No. of X-ray 
Photographs 

N, 

Relative 

Frequency 

0 

1125 

.8957 

1 

47 

.0374 

2 

23 

.0183 

3 

17 

.0135 

4 

17 

.0135 

5 

27 

.0215 

Total 

1256 

1.0000 


It follows from the table that out of the 1256 X-ray photographs, there 
were 1125 such that all the five independent readings gave negative re¬ 
sults. The number of those in which one reading gave a positive result 
and the four others negative results was 47, etc. These numbers are given 
in the second column of the table. The last column gives the relative 
frequency obtained by dividing each number in the second column by 
the total number 1256 of photographs examined. Thus the last number 
.0215 in the third column answers the question: how frequently did all 
five independent readings give positive results? 

1. Let n = 5 and assume that the proportions of healthy, moderately 
affected, and heavily affected individuals are • 


cci — .931, Qia — .049, cva — .020. 

Assume also that the three probabilities of a positive reading are 

Pi = .007, Pa = .535, Pi = 1.000. 

Compute the frequency function (4-3'l) for f = 0, 1, 2, 3, 4, 5 and com¬ 
pare its values with the empirical frequencies given in the last column of 
Table 4-6. • 



[4*3’23 PROBLEMS AND EXERCISES 199 

2. Perform the computations of Problem 1 assuming the same values 
for «! , ^ 2 , otz but different values for p's, namely 

Pi = .02, ps = .90, p3 = 1.00. 


Which of the two systems of values of Pi , P 2 , and pa seems to fit the ob¬ 
servations better? 

3. Stars are classified according to luminosity as dwarfs, giants and 
supergiants. For a given category, let a be the proportion of dwarfs, P the 
proportion of giants and 7 = 1 — a — jS, the proportion of supergiants. 
The method of classification involves examining spectra of the stars and is 
subject to error. The probabilities of misclassification are denoted as 
follows: 


For a dwarf 


{ 


the probability of classification as giant is pi ; 
the probability of classification as supergiant is p 2 - 


For a giant 


the probability of classification as dwarf is Pa ; 
the probability of classification as supergiant is pi . 


For a supergiant 


{ 


the probability of classification as dwarf is ps 
the probability of classification as giant is p® 


y 


A star is selected at random and three independent examinations of 
its spectrum are made. Let X and Y denote the number of these exami¬ 
nations which result in the star's being classified as a giant and as a 
supergiant, respectively. What are the possible values of X? What are the 
possible values of 7? What are the possible combinations of X and 7? 
Derive formulae for the probability P{X = m}, forP{7 = n}, and 
P{(A = m)(7 = n)}. 

4. Among patients of a certain category a proportion w suffers from cancer 
of a specified kind. For purposes of diagnosis, the patients are subjected to a 
special examination which consists of a single application of test A and of a 
triple application of test P. All the four tests are completely independent and 
each can give one of two possible results, either ^‘positive" or ‘‘negative." 
For a person suffering from cancer the probabilities of the result “positive" 
are pi by method A and P 2 by method B, The same probabilities for 
a person without cancer are p^ and p^ . 

A patient is selected at random from the particular category and subjected 
to the above examination. The letter X denotes the number of “positive" 
verdicts. What are the possible values of X? If fc is a possible value of X, 
write down the formula representing the probability P{X = fc}. 

Anmer: PIX = A} = »-[(l - p^)Ctp^(l - Pa)*'* + P,CtVr\l - Pa)*'"l 
+ (1 - ,r)[(l - Pa)Ca*p:(l “ pO*’* + PaCr‘pf‘(l - P4)^'*]. 



200 H YPERGEOMETRIC VARIARLE [ 4 * 4 * 2 ] 

5. The method of diagnosing cancer of the lung by examining sputum 
consists of the following steps: (i) A sample of sputum is collected from the 
patient; (ii) A small subsample of sputum is transferred to a microscopic 
slide; (iii) The preparation thus obtained is inspected under the microscope 
for cancerous cells. Each of these three steps may be the origin of failure 
in the examination. 

Denote by pi the probability that a sample of a patient’s sputum will 
contain cancerous cells, given that he suffers from cancer of the lung. 

Denote by pj the probability of transferring some cancerous cells onto 
the slide, given that the sample of sputum contains such cells. 

Denote by p* the probability that the slide will be diagnosed as containing 
cancerous cells given that it does contain them. 

A total of k samples of sputum are taken from a patient and m slides 
are made from each sample. Each slide is examined n times. Assume that 
all these k X m X n operations are completely independent and derive the 
formula for the probability that at least one slide will be diagnosed as con¬ 
taining cancerous cells given that the patient has cancer of the lung. 

Substitute values pi = .80, pi = .25 and ps = .90 and decide which is 
better (a) take 9 samples of sputum, make just one slide from each sample 
and examine it once, or (b) take k = 3 samples of sputum, make m = 3 
slides out of each and examine each slide once. 


4*4. Hypergeometric distribution 

4 • 4 • 1. Hypergeometric variable. Consider a set So composed of iV, + iV, 
objects, of which Ni have a distinctive property A (are -4-objects, for 
short) while the other N 2 objects do not possess the property A (are 
iJ-objects, for short). Assume that Ni and are both greater than zero. 
Let n be a positive integer ^ iVi -f- A'* and let G(n) denote an unordered 
group of n objects selected from the set So . Finally, let S(n) denote the 
set of all different unordered groups G{n). The set S(n) will be considered 
as the fundamental probability set. 

Definition 4-9. The random variable X defined over the set S(n) as the 
number of A-objects among the n objects fdrming group Gin) is caUed the 
hypergeometric variable. 

i-i-2. Frequency function of the hypergeometric variable. The reader 
will recall a number of problems considered in the foregoing subsections 
which involved hypergeometric variables. Now we shall deduce a general 
formula giving the non-zero values of the frequency function of the hyper¬ 
geometric random variable (the hypergeometric frequency function, for 
short). 

To begin with, we notice that the possible values of the hypergeo- 



[4-4-2] FREQUENCY FUNCTION 201 

metric variable are necessarily integers between zero and n. However, 
contrary to what we have seen in the case of the binomial variable, the 
set of non-negative integers which are possible values of the hypergeometric 
variable does not necessarily begin with zero nor end with w. In fact, we 
have the following obvious restrictions which X must satisfy. 

Since X is the number of A-objects in G(n), it cannot possibly exceed 
the total number Ni of A-objects available in the set So . Similarly n — X 
represents the number ^f ^-objects in G{n) and therefore cannot exceed 
the total number N 2 of yl-objects present in So, Thus, the random variable 
X must satisfy simultaneously the following conditions 

0 ^ X ^ n, 

X ^ X,, 


n ~ X g X, . 

The last condition can be written more conveniently as 

n - iVa ^ X. 

The actual limits of the possible values of X depend, then, on which of 
the above conditions is more restrictive. There are two conditions limiting 
the value of X from below, namely 0 ^ X and n — N 2 ^ X. To satisfy 
both of these conditions, every possible value of X must be at least equal 
to the greater of the two numbers zero and n — Xg . Thus, if n ^ N 2 so 
that n — Xg ^ 0, then the more restrictive condition is 0 ^ X so that 
X is allowed to take values 0, 1, 2, • • • .On the other hand, if w > Xg , 
then the more restrictive condition is n — Xg ^ X and the possible values 
of X are n — Xg , n “ Xg + 1, • • • , etc. 

Let us now turn to the conditions limiting the range of values of X 
from above. There are two such conditions. One of them is X g n and 
the other X ^ Xi . Here again, in order to satisfy both conditions at once, 
X must satisfy the one that is more restrictive. If n ^ Xi , then the 
greatest possible value of X is 71 . Otherwise, if w > Xi , then the greatest 
possible value of X is Xi . 

Combining the conclusions deduced with respect to the lowest possible 
value of X with the conclusions regarding the highest, we obtain the 
following four cases. 

(i) If n ^ Min (Xi, Xg), then the possible values of X are 0,1, 2, • • • , n. 

(ii) If Xg < n ^ Xi, then the possible values of X are n — Xg, 
n — Xg 4“ 1, • • • , w. 

(iii) If Xi <n S N 2 y then the possible values of X are 0, 1, 2, • • • , Xi. 

(iv) If n > Max (Xi , Xg), then the possible values of X are n — Xg, 
n — Xg + 1, • • • , Xi . 

Here the symbol Min (Xi , Xg) stands for the smaller of the two num- 



202 HYPERGEOMETRIC VARIABLE [4*4*2] 

bers Ni and N 2 . Also, the symbol Max (Nx , N 2 ) denotes the greater of 
the two numbers Nx and ATj. 

Let k stand for a possible value of X, The reader will have no difficulty 
in verifying that, no matter which of the above four cases presents itself, 
nevertheless the frequency function of the hypergeometric variable has 
the same form, say 

Px(k I ATi , iVa, w) 


(4.4. 1 ) 





^ _ n\(Nx + N 2 - n)\Nx\N 2 \ _ 

(Nx + N 2 )\kl(Nx - k)\(n - k)\(N 2 - n + k)l' 

The reader will notice that in the notation for the frequency function 
of Xf we introduced, to the right of a vertical bar, the symbols Nx ^ N 2 
and n on which the frequency function depends. 

The hypergeometric variable has many important applications, and some 
of them, namely the applications to industrial sampling and to the enumer¬ 
ation of zoological populations, are discussed in Chapter 5. The trouble¬ 
some part in treating practical problems is connected with the fact that 
ordinarily all the variables in formula (4•4*1), namely iVi , ATj , n, and k 
are large. Thus, Nx + N 2 may mean the total number of manufactured 
products purchased by a consumer, of which N 2 are satisfactory and Nx 
are defective. Then the symbol n denotes the number of items in the lot 
which are selected for inspection and, finally, X stands for the number of 
defective items found among those inspected. Sampling inspection of 
manufactured products is used in cases when the purchased lots are large. 
Thus Nx + N 2 is likely to be in thousands and n in hundreds. 

In the problem of enumeration of zoological populations the situation 
is similar. Here Nx + N 2 may stand for the total number of salmon going 
up a river to spawn. Nx is the number of salmon caught, tagged, and re¬ 
leased. Finally n is the number of salmon inspected after spawning. It is 
seen that in most practical cases all three numbers, iVi, iVa, and n, will be 
very large. 

Upon inspecting formula (4-4-1) and, especially, upon computing a 
few of its values as indicated in the problems which follow this subsection, 
the reader will appreciate that, when the values of iVi , iNTj , and n are 
large, the use of formula (4-4-1) is very cumbersome. Therefore, we are 
faced with the mathematical problem of devising ways and means to 
obtain at least approximate values of (4•4*1) without undue expense of 
time and labor. Problems of this kind frequently occur in practical appli¬ 
cations of probability and statistics. Some of them are treated in the fol¬ 
lowing section. 



[ 4 ' 4 - 2 ] 


PROBLEMS AND EXERCISES 


203 


PROBLEMS AND EXERCISES 

The following problems are concerned with a random variable X defined 
as follows. The set So consists of Ni objects A and Nt objects B. The 
fundamental probability set S{n) is made up of all the different unordered 
groups of n < iV, + JV 2 objects selected from So . The random variable 
X is defined as the number of objects A within each group belonging to S{n ). 

1. What is the usual name attached to the variable X? 

2. Determine the possible values of the hypergeometric variable X and 
the corresponding frequency function, 


(a) 

if 

n = 4, 

= 5, 

II 

(b) 

if 

n = 5, 

N^ = 4, 

N 2 = 7, 

(c) 

if 

n = 7, 

AT. = 5, 

11 

(d) 

if 

n = 5, 

= 7, 

X* = 4. 


3. Use the methods of subsection 4-2-3 to study the frequency function 
of the hypergeometric variable X. In particular, 

(a) Determine whether or not the sequence of values of the frequency 
function may belong to a type which is not one of the binomial types (i), 
(ii), (hi). 

(b) How many most probable values can there be of a hypergeometric 
variable? 

(c) Obtain a formula determining the last most probable value of a 
hypergeometric variable. 

4. X is defined as the number of spades in a 13-card hand out of a full 
deck of 52 cards. What is the usual label attached to the variable X'i 
What are its possible values and its frequency function? What is the last 
most probable value of X? How many most probable values does X 
possess? What is the probability that X will assume its first most probable 
value? 

5. X is defined as the number of aces in a 13-card hand out of a full 
deck of 52 cards. Answer the questions asked in Problem 4. 

Answer: (a) Hypergeometric variable; (b) 0, 1, 2, 3, 4; 

M PfX - H - 13! 39! 4! 48! 

(C) r(A - K] - g2! (4 _ k )\(35 -h fc)!’ 


(d)l; (e)l; (f) 0.439. 



204 


LIMITS 


[4-5-1] 


4*5. Limits of the hypergeometric and the binomial 
frequency functions 

4 S1. Four useful formulae on limits. In the present section we will 
deal with limits of frequency functions. Basic definitions and theorems 
relating to limits are taught in calculus courses and can be found in every 
good book on the subject.* The purpose of the present subsection is to 
discuss briefly four formulae which are particularly important in studies 
of probability. 

(i) The number e, base of naiural logarithms. Consider the expression 

m - (i + i)'. 

By substituting the consecutive integers a: = 1, 2, • • • we obtain the 


following values of fix), 


/(i) = (i + j 


/(2) = (l + 

J = 2.25, 

/(3)=(i+1; 

)* = 2.37037 • • • , 

1 

II 

+ 

) = 2.44140 • • • , 


etc. It is seen that, as x increases, the function/(x) steadily increases. 

In courses on calculus it is proved that for any two positive numbers 
X < y, whether integers or not, 

m <f(y). 

Also it is proved that, however large x, the value of /(x) is always less 
than 3. From this it follows that, as x is indefinitely increased, the function 
fix) approaches a limit which cannot be greater than 3. The limit, of the 
function fix), as x is denoted by e and its value is 2.718,282, • • • , 
so that 

(4-5-1) lim fix) = lim (l + i) = e = 2.718,282 • • • . 

The number e so defined plays an important role in mathematics. In 

•See for example: J. F. Randolph and M. Kac, Analytic Geometry and Calculus. 
New York: Macmillan, 1948. 



FOUR USEFUL FORMULAE 


205 


[4-5-1] 

particular, it is taken as the base of the system of logarithms described as 
“natural.” The number e is irrational and, therefore, cannot be expressed 
by means of a finite number of decimals. In actual calculations we fre¬ 
quently need the logarithm of e to the base 10 , 

(4-5-2) logio e = .43429 ••• . 

Taking equation (4-5- 1 ) for granted, we can prove that 

(4-5-3) lim /(«) = lim (l •+• -) = e. 

*—♦—00 *—>—00 \ Z / 

In fact, let 21 be a negative number z = —x. Then 



and it follows that 

fiz) = /(-^) = (1 + - !)• 

If z —» — 00 , then X = —z tends to -f- ®. At the same time 

lim (x — 1) =-!-'») 

i™ (' + “ ■' 

and according to (4 • 5 • 1) 

lim f{x — 1) = e. 

<»-l)-oo 

Thus, 

lim f{z) = e. 

*—*—00 

Q.E.D. 

Now let a be any real number, positive or negative but different from 
zero! It is easy to see that 





[4-5-1] 


206 LIMITS 

To prove this we perform the following algebra 



As X tends to + <», the quotient x/a tends to + co if a > 0 or to — <» if 
o < 0. Hence, according to (4-5-1) and(4-5-3), 


and thus 
(4-6-4) 



= e, 



It is suggested that the reader prove for himself that the expression 
(l to e“ if a: —» — Combining this result with (4-5-4) 

we may write 


(4-6-5) 



e 


a 


which will be used below. Incidentally, while deducing (4‘5*5) we assumed 
that a 5 ?^ 0; it is obvious that this formula is also correct if a = 0. 

(ii) Infinite expansion of the exponential The value of the exponential 
e* may also be obtained from another formula which is proved in calculus 
courses. This formula is valid for every number a and reads 


(4-6-6) 


lim 

n-»« 


'l 4.-® . ^ 

r+ 1I + 2I 



In other words, whatever the number a, the value of e“ may be approxi¬ 
mated to any desired degree of accuracy by adding a sufficient number of 
terms 1 + o,/l ! + aV2 ! + u®/3 I + - • - .If the reader is not familiar 
with the proof of formula (4-5-6), then he should convince himself by 
numerical examples that it is correct. With a few selected values of a he 
should use the logarithm of e given in (4•5-2) to compute the values of 
the left-hand side of (4-5-6). Then the same values of a should be sub¬ 
stituted in the sum in curved brackets in the second term of (4-5-6) with 
n = 1, 2, 3, 4, • - • . It will be seen that the corresponding values of the 
sum will approach e*. The speed of convergence depends on the value of 
a* The larger j a j, the slower the convergence. 



207 


[ 4 ’ 5 ’ 1 ] FOUR USEFUL FORMULAE 


(iii) Consider the expression 


4>(n, k) = 


n\ 

n\n - k)\ 


where both n and & < n are positive integers. In the following we shall 
need the limit of 0(n, k) when n —while fc is held constant. 

To determine this limit we notice that 


n ! = (n — fc) ! (n — A; + l)(n — fc + 2) • • • (n — l)n 


and, therefore, after cancellation. 


(4-5*7) 4 >in,k) 


— k + 1)(yi — fc + 2) »»» (n ~ l)n 


The numerator in (4*5-7) has exactly fc factors. Writing n* as a product 
of fc factors all equal to n, we rewrite (4-5*7) as follows: 

n — (fc — 1) n — (fc — 2) n — 1 
n n n 


fc) = 


(4.5.8) 


k - 

1 

1 

L-I 


J. 

W - 

n J 



Now </>(n, fc) appears in the form of a product of a fixed number (fc — 1) 
factors of the type 

n 


where m is an integer between the fixed limits 1 ^ m ^ k — 1. As n is 
indefinitely increased, each of these factors tends to unity. Therefore their 
product also tends to unity and we obtain the desired result 

( 4 . 5 . 9 ) lim<#>(n, fc) = lim—- = 1. 

n—o8 n-cQ n {n — fc)! 

It will be noticed that formula (4.5.9) implies that 

( 4 - 5 - 10 ) 

n-*oo ri. 

It follows from formula (4.5*8) that 0(n, fc) is always less than unity. It 
also follows that <^(n, fc) tends to unity faster when fc is small and more 
slowly when fc is large. The following table illustrates the speed with which 
0(n, fc) tends to unity as n is increased. 

(iv) Stirling's approximation to the factorial. When two integer num¬ 
bers, say m and n, are both large, then the direct computation of ratios of 
the type ml/nl is extremely laborious. However, a good approximation 
of the final result can occasionally be obtained without much difficulty 
by means of Stirling's approximation to the factorial. 



208 . LIMITS [4*5*1] 

In advanced calculus courses it is demonstrated that for each positive 
integer n, 

_ j_ 

(4-6-11) nl = nV* 

where ^ is a complicated function of n which always remains between 
zero and one, 

‘(4-5-12) 0 < ^ < 1. 

Table 4 -7 


Values of = n!/n*(n — k)\ 


n 

k = 1 

k = 2 


A* = 4 

k = 5 

= 10 

10 

1.000 

.900 

.720 

.504 

.302 

.000363 

20 

1.000 

.950 

.855 

.727 

.581 

.0655 

30 

1.000 

.967 

.902 

.812 

.704 

.185 

40 

1.000 

.975 

.926 

.857 

.771 

.293 

50 

1.000 

.980 

.941 

.921 

.848 

.398 

Wm 

1.000 

.990 

.970 

.941 

.903 

.628 


1.000 

.998 

.994 

.988 

.980 

.913 

EH 

1.000 

.999 

.997 

.994 

.990 

.956 


Since the number e is greater than unity, it follows that 

1 < 

As n is indefinitely increased, 

lime^ = 1 


and, therefore, also 

9 

(4-5-13) lime‘"” = l/ 

n-+« 

1 

Actually, e**" becomes extremely close to one if n is only moderately large. 
It follows that if we write, say, 

(4-5-14) S{n) = iCe" 


then 


jl! _g>** 

S{n) - 







FOUR USEFUL FORMULAE 


209 


[4-5-1] 

which is close to one and therefore S(n) may be considered as an approxima¬ 
tion to nl The expression (4*5*14) is known as Stirling's approximation to n\ 
It is extremely useful in various applications. However, the student must be 
clear about the exact meaning of the term ^‘approximation to nV^ as applied 
to (4*5*14). This meaning is summarized in formula (4*5*13) which can 
be written as 


lime 


6 

12n 


= 1 . 

n-»a» S\n^ 


Thus, if one computes n\ for increasing values of n and divides the results 
by the corresponding values of Stirling's approximation (4*5*14), the 


Table 4*8 


Stirling's Approximation to the Factorial 


n 

nl 

S{n) 

nl/S(n) 

D(n) = n! - S(n) 

2 

2 

1.9 

1.042 

0.1 

4 

24 

23.5 

1.021 

0.5 

6 

720 

710.1 

1.014 

9.9 

8 

40,320 

39,902.4 

l.OlO 

417.6 

10 

3,628,800 

3,598,699.6 

1.008 

30,100.4 


ratios so obtained will approach unity. However, the term “approximation 
to nV^ may suggest a different interpretation. Namely, one may be led to 
expect that, as n is increased, the difference 

(4*5*15) D(n) = n! - S{n) = S(n)(e^ - 1) 

will tend to zero. It is essential to remember that this presumption, 
plausible as it may seem, is definitely wrong. In fact, a closer analysis of 
the function 0 reveals that, although the difference 

e 

e^^^ - 1 

does tend to zero as n and does so very fast, the factor S(n) appearing 
in the right-hand side of (4*5* 15) tends to infinity even faster. The final 
result is that, as n —><», the difference D{n) between nl and Stirling's 
“approximation^' S(n) tends to infinity, 

lim [nl — y/2Tn] = <». 

n-»ao 

The situation is illustrated in Table 4-8 which gives the values of n!, 
<S(n), n\/S{n) and D{n) for several values of n. 



210 


LIMITS 


[ 4 - 5 - 1 ] 

It is seen that, if n is only moderately large, the ratio n\/S{n) is very close 
to unity. On the other hand, the difference D{n) grows without limit as 
n is increased. 

In order to avoid possible misunderstandings connected with the use of 
Stirling's approximation it is advisable to use the exact formula (4*5* 11) 
combined with the inequalities (4-5 12). 


PROBLEMS AND EXERCISES 


1. The number tt = 3.141,59 ••• > e = 2.718,28 ••• . Use the in¬ 
formation supplied in this subsection to judge which of the two numbers 
is greater. 



Compute the values of /(x) and f(e) and verify your conclusion. 

2. Each of the three letters a, 5, c stands for some positive number. Use 
the information given in this subsection to decide which of the following 
two numbers is greater: 




a 

h + c 



Substitute a = 2, 6 = 
3. Find the limit 


3, and c = 4 and verify your conclusion numerically. 



Can you obtain the same number x using an expansion of the form 
(4-5*6)? What is the appropriate number a? Substitute this value of a 
and several values of n = 2, 4, 6, • • • to judge whether x is more rapidly 

1 — or by the expansion • 

Partial answer: ^ • 

Ve 

4. Suppose that the value of log e is not available. Compute x = e~^y 
accurate to three decimal places. ‘ Answer: 0.243. 

6. The constant number x is determined by the condition 


lim - . -1. 

n-.« (n + 3)!(n ~ 7)!n* 

Determine x. Answer: 1. 

6. Use Stirling’s formula (4*5-11) to obtain a proof of formula (4-5-9) 
that is different from the proof given in the text. 

7. .Suppose that the value of e is unknown. How can you compute e and 
be certain that your result is accurate to three decimal places? Hint: One 



LIMIT OF HYPERGEOMETRIC 


[ 4 - 5 - 2 ] 


211 


of the formulae that should be known to the student gives an easy way to 
estimate the precision in computing when x is positive. 


4*5*2. The binomial distribution as the limiting form of the hypergeo¬ 
metric. In subsection 4*4*2 it was pointed out that, when all or some 
of the three numbers, Ni , N 2 , and n, are large, the computation of the 
hypergeometric frequency function pxik | ATi , ATg , n) may be very cum¬ 
bersome. In the present subsection we shall show that, in an important 
category of cases, a good approximation to px{k \ Ni , N 2 y n) may be 
obtained by computing a much simpler formula representing a binomial 
probability. The category of cases in question is when the total number 
N = Ni + N 2 of objects A in the set Sq is large, the number n of objects 
in the group G(n) is moderate and also the proportion 


- 

^ N. + N^ 


of .4 -objects in the set So is close neither to zero nor to unity. The specific 
result which we shall prove is formulated in Theorem 4*11. In order to 
state this theorem it is convenient to express Ni and N 2 in terms of N 
and p. It is easily found that Ni = pN and N 2 = {I ^ p)N. Upon sub¬ 
stituting these expressions into the symbol of the hypergeometric frequency 
function, we obtain 

■pxik I AT, , iVj, n) = pxik | pA^, (1 - p)Ar, n). 

Theorem 4*11. If the numbers k and n and the proportion p of A- 
objects in the set So remain fixed while the number N = Ni+N 2 of objects 
in So is indefinitely increasedy then tlw corresponding hypergeometric fre¬ 
quency function px{k I pNy (1 “ p)Ar, n) tends to a limits namelyy 

lim pxik I pN, (1 — p)N, n) = Cjp*(l - p)"‘*. 


In order to prove this theorem, we rewrite formula (4•4-1), which gives 
Px{k \ Ni , Ni , n), by multiplying and dividing the right-hand side by 
the same factor 


ATtiVr* 

N’ 


= p*(l 


- p) 


n—k 


and regrouping. We have 

Px(,k I ATi , Ni , n) 


(4-5-16) 


= Px(k \pN,il - p)N, n) 


= Cy(l - p) 


NI 


AT,! 


AT,! 


Nr\N, -n + k)\ iVt(Ar, - k)] 



212 


POISSON LAW 


[ 4 - 5 - 3 ] 

As N tends to infinity while p remains constant, Ni and N 2 tend to infinity 
also. Using (4 • 5 • 9) and (4 • 5 • 10) we conclude that, as A* , the last three 
factors in the right-hand side of (4-6-16) tend to unity. This proves the 
theorem. 

To illustrate the use of this result we may return to the example already 
mentioned where So is a lot of manufactured products. If this lot is small 
and contains, say, iV = 10 items, then the number X of defectives which 
may appear in a sample of n = 3 objects is a hypergeometric variable 
which has a frequency function distinctly different from the binomial. 
However, if the lot So contains N = 1000 items, with the proportion of 
defectives neither very small nor verj^ large, and the size n of the sample 
is of the order of 10, then the frequency function of the hypergeometric 
variable X will be difficult to distinguish from the binomial law as specified 
in the above theorem. 

4*5*3. Poisson Law as the limit of the binomial. Certain applications 
require the consideration of the binomial variable X for very large values 
of n combined with exceedingly small values of p. An extreme example is 
provided by radioactive phenomena. Let n be the number of atoms of a 
radioactive substance observed during a fixed period of time T, say one 
minute. It is assumed that for each atom there exists the same probability 
p of disintegrating during the time T, and that the disintegration of one 
particular atom does not alter the probability of disintegration for any of 
the others. With these hypotheses the number X of disintegrations during 
time T is a binomial variable with frequency function generated by the 
expansion of (q + pu)”, so that 

(4.5.17) 

If we attempt to apply this reasoning to any practical case, it will be 
necessary to assume that n is a colossal number even if the quantity of 
radioactive matter considered is exceedingly small. Also, since observable 
disintegrations of atoms are relatively rare, it must be assumed that p is 
minute. In these circumstances, the computation of the probability 
P{X = k} by means of the familiar binomial formula is very cumbersome 
or, even, prohibitive. It is essential that the reader learn to appreciate 
this difficulty, for example, by attempting to compute (4-5 -17) for A; = 3, 
n = 60, p = .05, and for A = 3, w = 600,000, and p = .000005. It will 
be seen that, in the second example, the use of ordinary logarithm tables, 
with five decimal places, does not provide satisfactory precision. 

To obviate the difficulty, we put np = X and compute the limit of the 
probability (4-6-17) when n —><» while X and k are held constant. Sub¬ 
stituting 




[ 4 ' 5 ' 3 ] LIMIT OF BINOMIAL 

in (4 • 5 • 17) and rearranging, we obtain 


213 


P{X = A} 


n! 

fc!(n — fc) 


K _ .. I' , (i _ 

A;!n*(n - ifc)! \ nJ \ n) 


We now increase n indefinitely while holding k and X constant. The first 
factor on the right-hand side does not change. Because of (4*5*9), the 
second factor tends to unity. The third factor also tends to unity for 
obvious reasons. The last factor can be written as 



and tends to because of (4*5*5). It follows that 


limPjX = k] 



The result obtained is summarized in the following theorem. 

Theorem 4*12. If the number n of completely independent trials is 
indefinitely increased while the probability p of success in each trial is in-- 
definitely decreased^ so that the product np = \ remains constant^ then the 
frequency function of the corresponding binomial variable tends to the limit 

(4•5* 18) lim px(Jk) = = p*{k), say, 


for every fixed integer fc ^ 0. 

The function p^{k) defined in (4*5*18) as the limit of the binomial 
frequency function is called either the Poisson Law or “The Law of vSmall 
Numbers.” The function p*(A;) is defined and has positive values for all 
non-negative integer values of ifc = 0 , 1, 2, * • • . Using formula (4*5*6) 
it is easy to see that 


Sp*(A:) = = 1. 

jb-O Jb-0 


In fact, if one factors out the exponential e the terms remaining under 
the summation sign will add up to Hence 

00 V* “ 


ik\ 


In the theory of probability treated on a more advanced level than in this 



214 POISSON LAW [ 4 ' 5 - 3 ] 

book it is usual to study a random variable, say F, capable of assuming 
all non-negative integer values 0, 1, 2, • with the probability 

p{y = *} = say, 

where X stands for a fixed positive number. The variable F with this 
property is called the Poisson variable. The non-zero values of its fre¬ 
quency function prik) are given by the Poisson Law. 

Since the set of possible values of the Poisson variable is infinite, the 
fundamental probability set over which this variable is defined must also 


Table 4-9 

Binomial Probabilities and the Poisson Law 



Binomial Frequency Functions 

Poisson 

Law 

k 

p = .5 

p= .2 

p = .1 

p = .05 

p = .02 

p = .01 



n = 6 

n = 15 

n = 30 

n = 60 

n = 150 

n = 300 

■1 

0 

.0156 

.0352 

.0424 

.0461 

.0483' 

.0490 


1 

.0937 

.1319 

.1413 

.1455 

.1478 

.1486 

.1494 

2 

.2344 

.2309 

.2276 

.2259 

.2248 

.2244 


3 

.3125 

.2501 

.2361 

.2298 

.2263 

.2252 


4 

.2344 

.1876 

.1771 

.1724 

.1697 

.1689 


5 

.0937 

.1032 

.1023 

.1016 

.1011 

.1010 


6 

.0156 

.0430 

.0474 

.0490 

.0499 

.0501 

.0504 

7 

— 

.0138 

.0180 

.0199 

.0209 

.0213 

.0216 

8 

— 

.0035 

.0058 

.0069 

.0076 

.0079 

.0081 

9 

— 

.0007 

.0016 

.0021 

.0025 

.0026 

.0027 

10 

— 

.0001 

.0004 

.0006 

.0007 

.0008 

.0008 

11 

— 

.0000 

.0001 

.0001 

.0002 

.0002 

.0002 

12 

— 

.0000 

.0000 

.0000 

.0000 

.0000 

.0001 


be infinite. Since the scope of the present book is limited to considerations 
of finite F.P.S.'s, a closer study of the Poisson variable is impossible. 
Nevertheless, the Poisson Law will be used here from time to time and 
will be interpreted as a convenient formula which gives the approximate 
values of the binomial frequency function when n is large and p is small. 
The approximation attained with a given value of n depends on the value 
of X .= wp- The smaller X, the better the approximation. The degree of ap¬ 
proximation attained is illustrated in Table 4*9 and Figure 19. 

It is seen that the Poisson Law provides a good approximation to the 












[4‘5'4] PROBLEMS AND EXERCISES 215 

binomial even if n is only moderately large. Useful tables of the Poisson 
Law are due to E. C. Molina [2], 

PROBLEMS AND EXERCISES 

1. Use the method of subsection 4*2*3 to prove that the Poisson fre¬ 
quency function must be either J-shaped decreasing or unimodal. 

2. Prove that the last most probable value fco of the Poisson variable 
is determined by the double inequality X — 1 < fco ^ X. IIow many most 
probable values can there be? What is the necessary and sufficient condi¬ 
tion for the existence of exactly two most probable values of the Poisson 
variable? 



Figure 19. Binomial Probabilities and the Poisson Law. 

3. y is a Poisson variable and it is given that P{ F = 0} = .1. Compute 
the probability P{ y = fc}, for fc = 1, 2/3,4, accurate to two decimal places. 

4. y is a Poisson variable and it is given that P{ F = 3} = P{ F = 4). 

Compute P(y == k], for k = 0, 1, 2, • * * ,7, accurate to three decimal 
places. Answer: .018, .073, .147, .195, .195, .156, .104, .060. 

5. y is a Poisson variable and it is given that P(X = 1} is three times 
as large as P{X = 4}. Compute a few non-zero values of the frequency 
function of F. 

4*5*4. Normal law as the limit of the standardized binomial. The Poisson 
limit is of considerable help when computing the binomial frequency 
function corresponding to large n and small p. The same passage to the 





216 NORMAL LAW [4'5*4] 

limit is useful when n is large and p is close to unity. In this case, g 1 — p 
will be close to zero and the frequency function of the number 7 = n — JC 
of failures in the course of n completely independent trials will be ap¬ 
proximately equal to the Poisson Law with \ = nq = n(l — p). How¬ 
ever, many problems of application require the consideration of the bi¬ 
nomial variable X corresponding to large values of n and to values of 
p and ^ = 1 — p, neither of which is close to zero. In cases of this kind, 
the Poisson approximation is inapplicable, and other methods are neces¬ 
sary to obviate the difficulty connected with the direct computation of 
the binomial. 

The first idea likely to occur in cases of this kind is to seek the limit 
of the binomial probability 

k\(ri 

when n —> 00 and p and k are held constant. The following theorem indicates 
that this idea is not helpful. 

Theorem 4*13. If X{n) is the binomial variable defined as the number of 
smxesses in n completely independent trials with a fixed probability of success 
p, and if k(n) denotes the last most probable value of X(n), then 

Vim P\X{n) = kin)] = 0. 

»-*«> 

In other words, the theorem asserts that, if n is increased while p is held 
constant, then even the greatest value of the frequency function of the 
corresponding binomial variable tends to zero. It follows that the frequency 
function tends to zero for all values of its argument. 

Proof. We begin by writing the expression of the probability 
P{Xin) = kin)] and using Stirling's formula (4*5-11) for each of the 
three factorials. To simplify the notation occasionally we will write simply 
k instead of kin). 


. ^ 

n”e"* 

——-JT ^ 

V2ir*e"*(» - VMn - A) 

where 0i, , and are positive numbers less than unity. Upon 

and rearranging, this formula is brought to the following convenient form; 

P(.X'(n) = A(n)} 

(4'5’19) I £i__i_ii_ / \t+j/ 

s —>*<—*> {«£) Y V * 

\/2mpq \kj \n — k) 



LIMIT OF BINOMIAL 


[4-5-4] 


217 


The first of the four factors on the right-hand side contains n in the de¬ 
nominator and, therefore, tends to zero as n is increased, 


lim -= 0. 

n-*oo 

If we show that, as n is increased, each of the three remaining factors is 
bounded—that is to say, that none of them can exceed some fixed number— 
then it will follow that the whole product on the right-hand side of (4 • 5 • 19) 
tends to zero as n —> . With respect to the second factor the conclusion 
is immediate. We notice first that, for sufficiently large values of n, both 
fc(n) and n — k(n) are at least equal to unity. In fact, using formula 
(4-2-4) we have 

(n + l)p — 1 < k{n) ^ (n + l)p 

which is 

t 

(4 • 5 • 20) np — q < k{n) ^ np + p. 

Therefore, 

nq — p ^ n — k{n) < ng + g. 

For k{n) to be at least equal to unity it is necessary and sufficient that 

np — q ^ 0 

which may be written 


(4-5-21) n ^ 2 . 

P 

For n — k{n) to be at least equal to unity it is necessary and sufficient that 

ng — p ^ 1 

which may be written 

(4-5-22) n ^ • 

I 

Thus, if n exceeds the greater of the two limits (4-5-21) and (4-5-22), 
then both k(n) and n — k(n) are at least equal to unity. If so, then what¬ 
ever the values of , ^ 2 , and ^3 with 0 < , ^ 2 , ^3 < 1 , 

9 \ __ 9 » __ 9 » 9 1 1 

gl2n 12k 12(n-fc) ^ ^12n gl2n 


It follows that, as n is increased, the second factor on the right-hand side 
of (4-5• 19) remains bounded. To prove the same statement concerning the 
third factor, we use the first of the inequalities in (4-5-20), rewriting it 
in an equivalent form 

j- < _i - 

k{n) np — q 



218 


NORMAL LAW 


[4-5-4] 


or 


np np np - q + q _ ^ _ 

A;(n) np -- q np q np -- q 

From this there follows that 


(4-5*23) 




np — q) 


owing to the second inequality in (4-5*20). Since the exponent 


. ,1 , Z np — q , Z .3 

np + p + np - q-Y ^ + 2 + 2' 

formula (4 • 5 • 23) implies 

fe) < + x) (^ + x) 


say, 


where x = {np — q)/q. When n is so large that a; > 1, then the second 
factor on the right-hand side is less than 2®^^. At the same time (see sub¬ 
section 4* 5 1), the first factor is less than e®. Therefore, for the same 
sufficiently large values of n 



< 2V. 


By a similar argument the student will have no difficulty in proving that 
the last factor on the right-hand side of (4*5 -19) is also bounded. Namely, 
it is easy to show that for n > 2p/q 


/ nq 
\n — /c(n)/ 


< 2V. 


It follows that, as n is indefinitely increased, even the greatest value of 
the frequency function of the binomial variable X{n) tends to zero. 
Figure 20 illustrates the contents of the theorem just proved. The dots 
which are connected by segments of straight lines give the values of the 
frequency function of binomial variables for p = J and for increasing 
values of n = 2, 5, 8, 11, and 14. The points relating to any given value 
of n are connected together for the sole purpose of making them easier 
to distinguish from the others. It is seen that, as n is increased, the possible 
values of the binomial variable extend over larger and larger sections of 
the axis of abscissa. At the same time the frequency function ^‘flattens,’' 
at first rapidly and then slowly, and all of its values tend to zero. 

In order to see that the passage to the limit described in the foregoing 
theorem is not Very useful, one only has to try to use this passage to answer 
questions like the following. Suppose that p = n = 100: What is the 
probability that the corresponding binomial variable X will satisfy the 



LIMIT OF BINOMIAL 


219 


[4-5-4] 

inequalities 50 < X ^ 55? The answer would have to be somewhat like 
the following. The probability in question is the sum of five probabilities 

(4-5-24) P{50 < X ^ 55} = P{X = 51) + P{X = 52} 


+ P{X = 53} + P{X = 54} + P{X = 55}. 

Since the value of n under consideration is rather large and since it is 
known that as n increases, all non-zero values of the binomial frequency 



Figure 20. Frequency Function of the Binomial Variable, p = j. 


function tend to zero, the conclusion is that each of the five terms on the 
right-hand side of (4-5-24) must be rather small. Thus, their sum 
P|50 < X ^ 55} cannot be very large. 

While the conclusion reached is correct, it is not very informative. More 
precise and very useful results are obtained from the following theorem 
due to Laplace. 

Theorem 4-14 (Laplace^s Theorem). Whatever be two numbers ti < fa , 
and whatever be the fixed value of the probability of success 7 >, 0 < p < 1, t/ 
the number n of completely independent trials is indefinitely increasedy then 
the probability that the corresponding binomial variable X{n) mil satisfy the 
inequalities 


ti < 


X{n) - np_ ^ ^ 
\/npil - p) 


(4-6-25) 



NORMAL LAW 


[4-5-4] 


220 

tends to the limit 

— 7 = / e ^ dx, 

V2ir •'<» 

In other words, the theorem of Laplace asserts that 

\imp{u < < J = Gih) - G{tO 

n -08 \/np(l — p) ' 

where the function G{t) is defined by the equation 


(4-5-26) 



x» 



A detailed proof of the theorem of Laplace requires more mathematical 
maturity than may be expected from the average reader of the present 
book. For this reason the proof is given separately in the advanced section 
4-6. However, it is essential that the reader acquire familiarity with the 
theorem and its applications. The first step in this direction is an under¬ 
standing of formula (4 • 5 • 26). 

The integral on the right is commonly known as the Gauss-Laplace 
integral or the Normal integral. The function under the integral sign, say 


y = 9{x) s 




is called the standardized or reduced Normal probabiUty density function 
and plays an exceptionally important role in probability and statistics. It 
is defined for all real values of the argument x, has a unique maximum at 
X — 0 (equal to the reciprocal of \/2ir), and decreases steadily as | a; | is 
increased. Since g(x) depends on only the second power of x, it is obvious 
that 


g(x) = g(-x) 

80 that the function g(x) is symmetrical about the ordinate axis. Figure 21 
represents the graph of g(x). 

All the foregoing details concerning the function g(x) must be accessible 
to the student, who will have no difficulty in constructing a graph of g(x). 
In doing so he will need the logarithm of the number e given in (4•5-2). 
However, the following information will have to be taken on faith. 

The area under the curve y — g{x) extending from minus infinity to 
plus infinity is exactly equal to unity. The area qnder the same curve 
extending from — <» up to as = f (shaded area in Figure 21) is represented 
by the integral G(t) of (4-5-26). In other words, whatever point x = ton 
the axis of abscissae we take, the integral G(t) is equal to the area to the 



[4-5*4] LIMIT OF BINOMIAL 221 

left of the vertical x — ty bounded by this vertical, by the curve y = gix), 
and by the axis of abscissae. 

It is unfortunate that the integral G(t) cannot be expressed in terms of 
elementary functions, such as polynomials, fractions, etc. However, the 
difficulty involved is overcome by the use of tables of this integral, which 
exist in great variety. The most extensive and convenient tables of the 
normal integral were prepared by the U.S. Federal Works Agency, Work 
Projects Administration for the City of New York, under the guidance of 
Arnold N. Lowan [1], and represent a monument to the useful organiza¬ 
tional effort during the depression. The tables are published in two volumes 



of over 300 pages each. The quantities tabled are given with an accuracy 
of fifteen decimal places, and the argument is given with four significant 
figures. Sold at a very moderate price, the WPA tables are generally 
accessible and provide a very useful tool for anyone engaged in computa¬ 
tions involving the Normal integral. 

As far as the scope of the present book is concerned, the WPA tables 
are much too extensive. All that the student will need is summarized in 
two one-page tables given at the end of the book. The first of these tables, 
called the Direct Table of the Normal Integraly gives the values of, say, 
H{x) = G{x) “ G(0) for positive values of x, proceeding at intervals of 
.01. Thus H{t) is equal to the area under the curve y = g{x) bounded by 
the two verticals at x = 0 and at x = i > 0. The table is arranged some- 


222 


NORMAL LAW 


[ 4 - 5 - 4 ] 

what as tables of logarithms. Each line corresponds to the value of the 
argument given at the left column with precision of .1. The successive 
columns correspond to the values of the second decimal of the argument. 
Thus, to find the value of H{x) corresponding to x == 1.53, one must find 
the line corresponding to a; = 1.5 and then locate the column corresponding 
to .03. The intersection of these gives Zf(L53) = .4370. 

The second table of the Normal integral is the inverse of the first and 
is arranged in a similar way. In other words, the argument of the second 
table is H{x) and the quantity tabled is the corresponding x. Thus, that 
value of x for which H{x) = .025 is found in the Inverse Table of the 
Normal Integral at the intersection of the line marked H = .02 and the 
column marked .005. The value is ar = .0627. 

It is easy to see that the two tables of the Normal integral just described 
provide all the information needed about the integral G(x), Because of 
the symmetry of the function g{x)y the integral 0(0) =* Thus, if t is 
positive, then 

G{t) - 1 + H{t). 

This is illustrated in Figure 21. If t is negative, then 

G(0 = i -H(MI), 

again because of the symmetry qi g{x). 

Applications of the theorem of Laplace require the values of the differ¬ 
ences (?(< 2 ) "" G{ti). These values are easy to obtain from the Direct 
Table of the Normal Integral. We have to distinguish the following three 
cases. 

(i) ti and <2 are both non-negative 0 g ^ • In this case the difference 

G{t^ — G{ti) is equal to the area under the curve y = g{x) bounded by 
the two verticals x ^ h and x = <2 , both situated to the right of the origin 

— 0, and it is easy to see that 

(?(« - G{i,) = H(t,) - 

(ii) Both tx and U are nonpositive, so that ti < (2 ^ 0. In this case the 
difference 0 (^ 2 ) "" G{ti) is equal to the area under the curve y == g{x) 
bounded by two verticals both situated to the left of the origin. Because 
of the symmetry of g{x)y it is obvious that this area is equal to that 
bounded by two verticals placed symmetrically on the right of the origin, 

Gih) - G{ix) = G{-tx) - 

and thus 

(?(« - G{tx) = H{-tx) - Hi-U). 

(iii) tx is negative and is positive, <1 < 0 < fa . In this case Gif^ — 
Q{tx) is equal to the total of two areas, one between the verticals at a; = 
and a; *= 0, and the other between the verticals at a; = 0 and a: = ^2 • 



LIMIT OF BINOMIAL 


223 




[4-5-5] 

Thus, in this case, 


0{k) - G(h) = [^(-O ~ Hm + [Hit,) - HiO)] 


= Hi-h) + Hit,). 


4-5*5. The use of the Normal integral as an approximation to the bi¬ 
nomial probability. The theorem of Laplace asserts that, whatever be 
p, 0 < p < 1, the probability that a binomial variable X(n) will satisfy 
the inequalities (4*5*25) tends to Git,) — G(<i) as n —>». Whenever n is 
large and neither p nor ^ = 1 — p very small, this theorem justifies the 
presumption that the probability 

Xjn) -^p_ ^ 

\/wp(l v) 

is approximately equal to the difference Git,) — GiQ. 

In order to use the Normal integral as an approximation to the sum of 
binomial probabilities, one has to identify the values of U and i, relating 
to any given case. Let n and p be fixed, and suppose that it is desired to 
compute the probability that the binomial variable X(n) satisfies the 
conditions 

a < Xin) < b, 




where a and b are any two numbers, 0 < a < 6 < n. Easy transformations 
give 


P[a < Xin) < 5} = 



^ Xjn) — np ^ b — np\ 
\^npq y/npq)* 


and, if n is large enough, the approximate value of the probability will be 
given by 


b — np \ — np 

y/npq^ ^\/npq 

so that 



— 9lZLI!P 
^ y/npq ^ 


4 _ b — np 


While the procedure just described is essentially correct, it is interesting 
to notice the possibility of improving it somewhat. The necessity of some 
improvement is apparent from the fact that, if the number a is varied 
between any two successive integers — 1 and ki , so that 

(4*5*27) fci - 1 g a < ibi, 

then the probability P[a <Xin) < fe) remains unchanged. Similarly, this 



224 NORMAL LAW [4'5‘5] 

probability will not change its value if b is allowed to vary between the 
integers fc* > fc, and kt + 1, 

(4-5*28) k, <b g k, + 1. 

As the reader will have no difficulty in perceiving, the reason for the above 
circumstances is that X(n) is capable of assuming only integer values and 
that, therefore, 


P{a < X(n) <b} = P{X(n) = jfc,} + P{X(n) = fc. + 1} 

(4-5-29) 

+ P{X(») = &, + 2} + - - - + P{X(n) = k,} 

irrespective of the values of a and b between the limits (4-5-27) and 
(4-5-28). Although the variation of a and b within the limits indicated 
leaves the probability (4-5-29) unchanged, it does change the arguments 
of the Normal approximation (4-5-28) to the same probability. In this 
connection it is reasonable to inquire which value of a between the limits 
k, — 1 g a < and which value of b between the limits A:* < 6 ^ Aij + 1 
should be taken to obtain the best approximation. 

The answer to this question depends on many factors, such as the values 
of n, p, ki, and k 2 and also on the exact definition of the expression “best 
approximation.” Frequently the answer is too complicated to be used. 
However, reasonable precision ordinarily is obtained by computing f, and 
fa from the formulae 


(4-5-30) 



(4-6.31) (. - . 

■y/npq 

Rule for using the Normal integral to compute approximate 

VALUES OF THE SUMS OP BINOMIAL PROBABILITIES. If fr, and fra ^ fr, + 1 
are two integers and X(n) stands for the binomial variable corresponding 
to given valties of p and n, then an approximate value of the probability 
P{ki g X(n) g Atj} is obtained by computing-Git^) — G(ti) where U and t^ 
aregivenby (4-5-30) and (4-5-31), respectively. 

Example. Let p = .1 and n = 36. To compute the probability 
P{6 ^ X(n) ^ 8} we use formulae (4-5-30) and (4-5-31) to obtain 


5.5 - 3.6 _ 1^ 

^36 X .1 X .9 1-8 


1.056, 


8.5 - 3.6 _ ^ 

\/36 X .1 X .9 1-8 


2 . 722 . 



225 


[4-5'5] LIMIT OF BINOMIAL 

Since both ti and 1 % are positivei 

G{t,) - G{U) = H{U) - H{U) 

and the Direct Table of the Normal Integral at the end of the book gives 

P{6 g X{n) ^ 8} = 0.4968 - 0.3543 = 0.1425. 

Here ^ indicates approximate equality. The student is invited to verify 
by direct computation that the true value of the probability is 0.1377. 

The foregoing discussion applies to computing the probability that 
fci ^ X(n) ^ fca where ki is an integer greater than zero and an integer 
less than n, so that the double formula limits the variation of X{n) on 
both ends of the range of its possible values. This problem should be dis¬ 
tinguished from the problems of determining the probabilities 

P{0 ^ X{n) ^ fca} with fca < n 

and 

P{ki ^ X(n) ^ n] with ki > 0. 

Since, by the nature of its definition, X(n) cannot have values less than 
zero, the first of these probabilities reduces to 

P{-oo < X{n) ^ h]. 

Also, since the value of X{n) cannot exceed n, the second probability 
coincides with 


P{fci ^ X(n) < +oo}. 

In consequence, the approximations to these probabilities are obtained by 
taking = — oo and <2 from formula (4-5-31) in the first case, and by 
taking U from formula (4-5*30) and ^2 = +» in the second. It will be 
remembered that 00 ) = 5, 

Thus far we have discussed formulae (4-5-30) and (4-5-31) in relation 
to problems which may be described as direct: given n, p, ki and fca , 
compute the approximate value of the probability P{fci ^ X(n) ^ fcj)- 
In the following subsections and particularly in Chapter 5, we shall be 
faced occasionally with problems of the inverse type: given the value of 
the probability P{ki g X(n) g fca), to determine ki and k^ subject to 
some other conditions. In problems of this kind, the values of ti and ta 
will be read from the Inverse Table of the Normal Integral and ki and fca 
determined by solving (4-5-30) and (4-5*31) 

ki = np + h yAm + 

+ <2 y/npq — J. 

Ordinarily the values so obtained will not be integers, and a complete 



226 NORMAL LAW [4‘5'5] 

solution would require rounding them off to the nearest integers and then, 
perhaps, verifying the solution by direct computation of the binomial 
probabilities. Verification of this kind occasionally reveals that the values 
of ki and sought differ by a unit or so from those computed using the 
Normal approximation as described above. 

The degree of approximation achieved in using the Normal integral 


Table 4-10 

Comparison of P[X{n) ^ vyith its Normal Approximation 



P = -5, 

n = 

12 

It 

n = 

13 

t 

P|X(n) ^ t] 

Normal 

Difference P{X(n) ^ /} 

Normal 

Difference 

0 

.0002 

.0008 

-.0005 

.0013 

.0039 

-.0026 

1 

.0032 

.0047 

-.0015 

.0126 

.0181 

-.0055 

2 

.0193 

.0216 

-.0023 

.0579 

.0632 

-.0053 

3 

.0730 

.0744 

-.0014 

.1686 

. 1679 

+ .0007 

4 

.1938 

.1932 

+ .0006 

.3530 

.3459 

+ .0071 

5 

.3872 

.3864' 

+ .0008 

.5744 

.5674 

+ .0070 

6 

.6128 

.6136 

-.0008 

.7712 

.7691 

+ .0021 

7 

.8062 

.8068 

-.0006 

.9023 

.9035 

-.0012 

8 

.9270 

.9255 

+ .0015 

.9679 

.9691 

-.0012 

9 

.9807 

.9784 

+ .0023 

.9922 

.9925 

-.0003 

10 

.9968 

.9953 

+ .0015 

.9987 

.9987 

.0000 

11 

.9998 

.9993 

+ .0005 

.9999 

.9998 

+ .0001 


P = -3, 

n = 

15 

P = .2, 

n = 

19 

t 

P|X(n) g t] 

Normal 

Difference P\X{n) g <} 

Normal 

Difference 

0 

.0047 

.0121 

-.0074 

.0144 

.0292 

- .0148 

1 

.0353 

.0455 

- .0102 

.0829 

.0936 

- .0107 

2 

.1268 

.1299 

-.0031 

.2369 

.2280 

+ .0089 

3 

.2969 

.2866 

+ .0103 

.4551 

.4317 

+ .0234 

4 

.5155 

.5000 

+ .0155 

.6733 

.6560 

+ .0173 

5 

.7216 

.7134 

+ .0082 

.8369 

.8352 

+ .0017 

6 

.8688 

.8701 

-.0012 

.9324 

.9393 

-.0069 

7 

.9500 

.9545 

-.0045 

.9767 

.9831 

-.0064 

8 

.9848 

.9879 

-.0031 

.9933 

.9965 

-.0032 

9 

.9963 

.9976 

-.0013 

.9984 

.9995 

-.0011 

10 

.9993 

.9996 

-.0003 

.9997 

.9999 

-.0003 

11 

.9999 

.9999 

.0000 

1.0000 

1.0000 

.0000 




[ 4 - 5 - 6 ] 


GEOMETRIC INTERPRETATION 


227 


p = .1, n = 34 


t 

P\X{n) g /} 

Normal 

Difference 

0 

.0278 

.0487 

-.0209 

1 

.1329 

.1387 

-.0058 

2 

.3255 

.3034 

+ .0221 

3 

.5538 

.5228 

+ .0310 

4 

.7504 

.7353 

+ .0151 

5 

.8815 

.8850 

-.0035 

6 

.9518 

.9618 

-.0090 

7 

.9831 

.9905 

-.0074 

8 

.9949 

.9982 

-.0033 

9 

.9986 

.9998 

-.0012 

10 

.9997 

1.0000 

-.0003 

11 

.9999 

1.0000 

-.0001 


depends on the values of n, p, ti , and ^2 • Whether or not a given degree 
of approximation is satisfactory depends on the nature of the problem 
treated. In treating problems in this book the student will not be seriously 
wrong in using the Normal integral instead of the binomial frequency 
function in all cases when np(l — p) ^ 3. Table 4* 10 illustrates the degree 
of approximation attained under this rule. 

4 • 5 • 6 , Geometric interpretation of the theorem of Laplace. The theorem 
of Laplace is concerned with the random variable, say X*(n), connected 
with the binomial variable X(n) by the equation 

x-(») . . 

y/npq 

In fact, the theorem asserts that, whatever be ^ < <3 , 

IimP{^ < X*(n) < 4} = “■ 

n-»«o 

The variable X*(n) will be described as the standardized or normalized 
binomial variable. Normalization will be understood to mean (i) measuring 
the value of X(n) not from the origin of coordinates but from the point, 
varying with n, corresponding to the abscissa np, and (ii) usii^ y/npq 
as the unit for measuring the deAriations X(n) np. 

It is obvious that the variable X*(n) is capable of assuming only w + 1 
different values which correspond to the possible values of X{n), 


—np 1 — np 2 — np n — np ^ 
y/npq ■y/npq ’ y/npq ’ ’ \/npq 



THEOREM OF LAPLACE 


228 


[4-5-6] 


Let Xk(n) denote the (fc + l)st possible value of X*(n), i.e., a?*(n) *= 
(fc — np)l'\/npq. It is seen that the possible values of X*(n) are equally 
spaced and proceed at increments equal to, say, 



Obviously, the probability that X*(n) will assume one of its possible values 
is equal to the probability that X(n) will assume the corresponding 
integer value. 

In order ' to establish the relation between the probability 
P{ti < X*{n) < ti} and the area under the normal curve 


y = 9(^) 



we shall prepare a special graph of the frequency function of the variable 
X*{n) in which the non-zero values of this function will be represented by 



areas of rectangles rather than by ordinates of points. On the Ox axis we 
mark all the possible values x*(n), fc = 0, 1, 2, • • • n, of the variable X*(n). 
As already mentioned, the distance between two successive points is 
always equal to An = 1/V^W- Le^ ^*(^) denote the interval of length 
An which has Xit(n) as its center. Thus the interval 6*(n) extends from 
^k(n) — iA„ to Xit(n) + §An. Now let Rk(n) denote a rectangle with 6*(n) 
as its base and with j/k(n), its height, so adjusted that the area of Rk(n) 
is equal to the probability that X’*'(n) will assume the value Xk(n). It 
follows that 


(4-6.32) y,(n)A. = P{X*(n) = x*(n)j = 






[4-5*6] GEOMETRIC INTERPRETATION 229 

The construction of two adjoining rectangles /Zfc-i(n) and Rkiri) is illus¬ 
trated in Figure 22. Figure 23 gives the series of rectangles /i*(n) corre¬ 
sponding to p = J and to n = 2, 8, and 14. Figure 24 corresponds to these 
same values of n but to p = The continuous curve in each figure is 
the Normal curve y == g{x). 





Figure 23. Normalized Binomial p =* i and the Normal Curve. 


It is seen that, even with very small values of n, the Normal curve fits 
the rectangles reasonably well. If two points a < b are marked on the 
horizontal axis, then the probability that a < X*(n) < b will be equal to 
the sum of areas of those rectangles Rk{n) whose centers fall between the 



THEOREM OF LAPLACE 


230 


[4-5-6] 


verticals through a and b. In other words, if *<(») is the smallest of the 
possible values of X*(n) which is greater than a, and if Xi,(n) is the largest 
of the possible values of X*(n) which is less than b, then the exact prob¬ 
ability P|o < Jf*(n) < 6} is equal to the area of the rectangles bounded 
on the left by » = Xi(n) — and on the right by a: = a:*(n) -f ^A. To 




Figure 24. Normalized Binomial p » ) and the Normal Curve. 


approximate this area by the area under the Normal curve, the latter 
diould be taken between the same limits. This is the geometric interpre¬ 
tation of formulae (4'5'30) and (4‘5-31). 

The rectangles I2*(n) have an interesting property which illustrates their 




[4’5’6] GEOMETRIC INTERPRETATION ^ 231 

connection with the Normal probability density. Mark the centers of the 
upper sides of all the rectangles Rk(n) and connect them by straight lines 
as indicated in Figure 25. The polygon so obtained will be denoted by 
ir(w). As n is increased, the rectangles Rk{n) become narrower and the sides 
of the polygon 7r(n) shorter. Eventually, the polygon approaches a smooth 
curve. Fix a point x and denote by y{Xj n) the ordinate of the corresponding 



point on the polygon 7r(n). Let Xk-\(n) and Xk(n) be the two possible values 
of X*{n) nearest to x so that 


(4*5*33) Xk^x{n) = - -g x*(n) = 

y/rvpq 

The object of our study will be the slope of the polygon 7r(n) at x, say 
a(a;, n), and the ordinate y(x, n) of the point on the polygon, also taken 
at X, More specifically, we will study the quotient, say 


k — np 
y/npq 


(4-5-34) 


Trix) = 


a{x, n) 
y{x, n) 


We will let n increase indefinitely, find the limit r{x) of r„(;r), and then 
look for a smooth curve L such that: (i) at each point x the slope of L 
divided by the ordinate is equal to r{x) and; (ii) the area under the curve 
L is equal to unity. We shall find that the only curve satisfying these 
conditions is the Normal probability density y == g{x). 

The slope of the polygon is computed from the formula 


(4-6-35) 





THEOREM OF LAPLACE 


[ 4 - 5 - 6 ] 


Solving (4-5'32) for yt{n) and substituting into (4-5-35) we find 


a{x, n) = 75 


1 J n! 


(4-6-36) 


K U!(n - A)! 


pq - 


(k - l)!(n - A + 1 ) 


]pg I 


“ A!(n - k + + P - *}• 


Upon inspecting the inequalities (4*5‘33), we see that z is connected with 
k and n by the relation 

^ _ k — d np 
y/npq 

where ^ is a number between zero and unity. Thus, if x remains fixed while 
n is indefinitely increased, the value of k satisfying (4*5-33) is 

/b = B + np + X y/npq. 

Substituting this value into the expression in brackets in (4-5*36), we 
obtain 

(4-5-37) a{x, n) = npq i)j P*"‘?“”*{P - 0 - x s/npq] 


- A!(n -k + 1 )! *}• 


We now compute the ordinate y(x, n). From similar triangles in Figure 26 
it is seen that 


and, therefore 


yM - y{x, n) = 0 [y»(n) - y*-,(n)] 


y(x, n) = + (1 - «)j/*(n). 


Using the expressions for yt-iin) and j/*(n) and performing easy trans¬ 
formations like those leading to the expression for a{x, n), we find that 


yix, n) = (npq)* 

.(4.-6-38) 

, (i _ (P - (p - g)^ \ 

I -y/npq wp? / 

Substituting (4’5‘37) and (4-5-38) into (4-5*34) we find 

r (x) =_>^P -2 _. 

y/npq 



233 


[ 4 ' 5 ' 6 ] GEOMETRIC INTERPRETATION 

This formula shows that, whatever be 

lim r^(x) = r(x) = —x. 

n~*oa 

This terminates the first part of our study. The second part consists of 
determining a smooth curve L such that: (i) its slope divided by its ordinate 



Figure 26. 


is always equal to r{x) = — x; and such that (ii) the area under L is equal 
to unity, li y = y(x) stands for the equation of Z/, then condition (i) 
implies 


(4-5-39) 


1 dy{x) ^ 
y{x) dx 


Plainly, the left-hand side of (4-5*39) is the derivative of log y{x)- The 
most general function with derivative equal to the right-hand side of 
(4-5-39) is ( — + log C) where C stands for an arbitrary constant. 

Therefore (4-5*39) is equivalent to 

£log»M-£(-|»-+]ogc). 

Hence 

log y{x) = -\x^ + log C 


or 

. (4-6-40) y{x) = 

Equation (4-5*40) was deduced using condition (i) only. The additional 
requirement (ii) that the area under the curve L must be equal to unity 




THEOREM OF LAPLACE 


234 


[ 4 - 6 - 1 ] 


uniquely detennines'*' C = l/\/2ir. It follows that the only function 
satisfying the condition 


y{x) dx n-»oo yi,x, n) 

and also the condition that the area under the corresponding curve L is 
equal to unity, is the Normal probability density function. 

The foregoing reasoning is occasionally interpreted as equivalent to a 
proof of the theorem of Laplace. The student will perceive that this is 
not so. However, the proposition proved is interesting because it establishes 
a link between the polygon v{n) and the Normal probability density and 
makes the statement of the theorem of Laplace very plausible. 

-*4*6. Proof of the Theorem of Laplace 

a 4’6*1. Prerequisites from calculus. Although the proof of the theorem 
of Laplace requires that the student be familiar with calculus, the mathe¬ 
matical prerequisites are very limited. In fact, to follow the proof, the 
student must be familiar with the following concepts only: (i) concept of 
a limit, (ii) concepjt of a continuous function, (iii) concept of a derivative 
and Taylor’s formula, and (iv) concept of the definite integral of a con¬ 
tinuous function. The first two items are likely to be known to most of 
the readers of this book. However, it may be useful to elaborate a little 
on (iii) and (iv). 

★(iii). Taylor’s Formula. Let f{x) be a function of the real variable x 
defined over an interval (a,6) and differentiable n times at any point of 
this interval. Let Xo be a fixed point and x a variable point, both contained 
in (a,i)). Finally, let /**’(x) denote the kth derivative of/(x). Then Taylor’s 
formula is 

m = /(X.) -h /'(x„) -1- rixo) + • • • 

where the term Bn , called the remainder of order n, is given by 

ft. = + d(x - Xo)), 

with ^ denoting a function of x about which it is known that 0 g ^ 1. 

proof that CBl/\/2ir requires the computation of a double integral and 
is not given here. 



[ 4 ’ 6 * 1 ] PREREQUISITES 235 

★In proving the theorem of Laplace we shall apply Taylor's formula to 
the natural logarithm of 1 + x, that is, to the function 


fix) = log (1 + x). 


We shall take n = 3 and Xo = 0. Differentiating three times in succession, 
we obtain 


m 


1 

1 + x’ 


rix) = 


1 

(1 + x)^’ 


r’ix) = 


2 

(1 + ‘ 


Hence, for x = Xo = 0^ f(xo) = 0, /'(xo) = 1, /"(xo) = — 1 and Taylor's 
formula gives 


(4.6.1) 


log (1 + x) = X — 


x^ , X^__J_ 

2 3 (1 + i^x)" ’ 


Since log (1 + x) is defined and differentiable within any interval (a,6) 
provided a > —1, it follows that formula (4.6.1) is applicable to any 
X > —1. 

★(iv). Definite integral. Let /(x) be a function of a real variable x 
defined and continuous for all values of x between the limits a ^ x ^ 6, 
where a and b > a are any numbers. Divide the interval (a,6) into m 
partial intervals by marking arbitrary points ai < a 2 < • • • < a„-i . 
Also let Uo = a and = b. The m + 1 points ao = o < ai < Ug < • • • < 
a„_i < am = b will be described as an ‘‘m-grid." The situation is illus¬ 
trated in figure 27. The interval (a^-i , a*) is described as the /cth “cell" 
of the grid. 

★Let A(m) denote the greatest of the m differences a* — an-i . Consider 
one of the intervals (a^^i , a*) and select in it an arbitrary point x* so that 
^ Xfc ^ a* . The point Xk will be described as the “designated" point 
in the fcth cell. Choose designated points in each of the m cells into which 
(a,6) is divided and form the sum, say, 


(4.6.2) 


Sim) = ^ (a* - 


★The sum S{m) is called a Riemann sum. Suppose that m is given ever- 
increasing values nil < m 2 < • • • < < • • • and that the operations 

just described are repeated for each of these values. As a result we shall 
obtain a sequence of Riemann sums, say S(mi), /S(m 2 ), • • • , S{mr)f • • • . 
In calculus courses it is proved that if the maximum cell length A{mr) in 
the m,-grid tends to zero as r —> <», then the sequence of Riemann sums 



236 


THEOREM OF LAPLACE 


[ 4 - 6 - 2 ] 

tends to a limit which depends on the function f(x) and on the interval 
but is independent of the particular grids used and of the choice within 
each cell of the designated points Xk . This limit is called the definite 
integral in the sense of Riemann^ or simply the integral, of the function 
fix), taken between the limits a and b, and is denoted by 

lim S(mr) = [ f(x) dx. 

tnr-*o» *^a 

A(mr)-*0 

★The reader will perceive that the Riemann sum S(m) has a simple 
geometrical interpretation. Namely, S(m) is the sum of areas of rectangles 
built on each subinterval (a*-i , a^) as a base with height equal to/(a:*), 
for fc = 1, 2, • • • , m. A few such rectangles are marked by dotted lines in 
Figure 27. 



★4*6*2. General idea of the proof. After these preliminaries we may pro¬ 
ceed to the proof of the theorem of Laplace as stated in subsection 4*5*4. 
Let ti and ia > ^ be arbitrary numbers and let X(n) be a binomial variable 
with frequency function, say, 


= *} = pAk I n) = cy(i - 


where p is a fixed number, 0 < p < 1, and fc = 0, 1, 2, * • * , n. In order 
to prove the theorem of Laplace, we shall compute the limit, as n is in¬ 
definitely increased, of the probability, say P(ti , , n), that X{n) will 

satisfy the double inequality 




(4-6:^ 


•y/wpCl - p) 


< <2 . 







IDEA OF PROOF 


237 


[4-6-2] 

★In this connection it is essential to be quite clear about the nature of 
the probability P{ti, (2 ,n). The reader will easily perceive that the double 
inequality (4 • 6 • 3) is equivalent to the double inequality 

(4*6*4) up + h \/np{l ~ p) < X(n) < np + t 2 y/np{l — p). 

For eac h n, deno te by A{n) the smallest integer which exceeds 
np + <1 \ /np(l — p ) and by B(n) the greatest integer which is less than 
np + <2 \/np(l — p). It is clear that in order to satisfy the double inequal¬ 
ity (4*6*4), and hence (4*6*3), the random variable X{n) must have for its 
value either A(n) or A(n) + 1 or A(n) + 2, • * • , etc., or B{n) but no other 
value. It follows that the probability P(ti , 4 1 n) is equal to the sum 

B(n) Bin) 

(4-6-5) P«, ,<.,«)= '£ p^{k I n) = £ Cy(l - p)"-\ 

k^Ain) fc-i4(n) 

The limit of this sum is difficult to compute because when n is increased 
the sum changes in two respects simultaneously: the value of each term 
p{k I n) changes and also the limits of summation A (n) and B(n) change. 
According to the definition of these limits we have 

(4*6*6) np + ii y/np{l — p) < Ain) ^ np + ^ \/np(l — p) -f 1 

and 

(4*6*7) np + <2 \/np(l — p) “ 1 ^ Bin) < np + ^2 y/npil ~ p). 
★It is seen that when n —> 00 


(4.6-8) 

lim Ain) = + 

n-*o> 

(4.6.9) 

livciBin) =-|-oo 

n-'+oo 

and 


(4-6-10) 

lim [Bin) — A(n)J = 




★Formula (4*6*8) follows from the fact that, whatever be <1 , 
np + <1 \4ip(l - P) = !p Vn + ii \/p(l - P)]* 

When n is indefinitely increased, then eventually the expression in brackets 
in the right-hand side will become greater than unity, and hence the term 
in the left-hand side greater than V^* Thus A(n) must exceed y/n 
for sufficiently large values of n and tend to infinity as n -^00 . A similar 
argument leads to the conclusion that Bin) must also tend to infinity 
when n —> 00 . 

★In order to prove (4*6*10), we notice that 

Bin) - Ain) ^ (<2 - fi) y/npil - p) - 2. 



THEOREM OF LAPLACE 


238 


[ 4 - 6 - 3 ] 


Since U ~ U > 0, the right-hand side of this formula tends to infinity 
when n — »oo and this implies (4*6 -10). 

AAfter this analysis we find that as n is increased, the sum in the right- 
hand side of (4-6-5) changes as follows: (a) the terms px{k \ n) change, 
(b) the number of terms tend to infinity, and (c) both of the limits of 
summation, A{n) and B{n), tend to infinity. In these circumstances and 
since each term px(k | n) is rather complicated, it is clear that any hope 
of computing the limit of the sum P(t, , tt , n) depends on the possibility 
of reducing the problem to a simpler problem. Thus, in order to compute 
the limit of the sum of complicated terms 

B(i.) 

P{ti , ti ,n) = 2 px{k I n), 

k-A{n) 

we shall seek a sum of simpler terms, say, 

B(n) 

Piih ,k,n)= Yi Pt(k I n) 

*-A(n) 

such that there is a guarantee that the two sums have the same limit. 


^4-6*3. Lemma of Duhamel. The construction of a sum Pi(ti y (2 , n) 
satisfying the above, conditions is based on the following lemma ascribed 
to Duhamel. To formulate this lemma, consider two infinite sequences 
of sums of positive terms 

fi(n) 

Sn = T„i , 

• -1 

and 

Hin) 

Sn = E T'^i , 

• -1 

forn = 1, 2, • • • . Each of the two corresponding sums S„ and contains 
the same number N(n) of terms. The ith term of the sum is designated 
by Tni and the ith term in the sum S, by Tnt . When n is increased, then 
both the number N(n) and the nature of the terms and Tl^i are allowed 
to change. 

aLbmma 4-1. If Ihe sequence of sums {S,} tends to a finite limit L as n 
is increased, 

(4-6-11) limS„ = L, 


and if to every positive number t there corresponds a number n(«) such that 
the inequality n > n(«) implies 



< e 



[4’6'3] LEMMA OF DUHAMEL 239 

for i = 1, 2, ••• , N{n), then the sequence {/S,} mvM also tend to the same 
finite limit, namely, 

lira (S, = *L. 

n-*« 

In order to prove the lemma we must prove that, whatever the number 
i; > 0, there must exist a number M{ri) such that, when n > M{rj) then 

<17. 

★Write 

S, - L = - SO + (S„ - L). 

Then it follows that 


(4.6.12) I - L I g I S,,- SJ + I - L|. 

★The hypothesis (4.6.11) implies that for every i; > 0 there exists a 
number M'{ri) such that n > M'{rj) implies 

(4*6.13) I S„ — L I < 517 , or L — J 17 < S„ < L + J 17 . 

We now write 

Nin) NM frp \ 

-Sn - s, = g - TL) = g - ij 

and it follows that 

N(n) 

(4-6-14) I - 2 „ I S E r'. 

t -1 



We set 




and use the second hypothesis of the lemma to assert that, whenever 
n > n(e), then 


(4*6.15) 


T' 

^ nt 


< 


—il— . 

L + ^17 


★Now denote by M( 17 ) the greater of the two numbers Af'(i 7 ) and n(€), 
and consider values of n such that n > ^( 17 ). Then both (4.6*13) and 
(4*6.15) are satisfied. Using (4*6.15) and (4.6*14), we conclude that, 
for the same value of n > Af (17), 


1 Sn ~ Sn I < 


_ii7_ 
L + * 2 ^ 


iV(n) 

E = 




—ii—s 


and, because of (4'6-13), 


(4-6-16) 


I < h- 



240 THEOREM OF LAPLACE [ 4 * 6 ' 4 ] 

Then (4-6-16), (4-6-13) and (4-6'12) imply | 5, — L| < ij, and the 
lemma is proved. 


a4 • 6 ‘ 4. Proof of Laplace’s theorem. The proof of the theorem of Laplace 
is obtained by appl 3 ang the lemma of Duhamel two times in succession 
and by referring to the properties of Riemann sums as described in sub¬ 
section 4-6 -1. Our first step, then, consists in devising terms, say pi(Jfc | n), 
simpler than the binomial probabilities 'px{k | n) and satisfying the con¬ 
ditions that whatever be € > 0, there exists a number ni(*) such that 
the inequality n > ni(e) implies 


Vxik I n) 
Viik I n) 


1 <6 


for all values otk = A{n), A{n) -f- 1, • • • , R(n). Then by the lemma of 
Duhamel, the existence of a finite limit L of the sum, say 

Bin) 

Piiii , ta,n) = Pi(k I n), 

k-A(.n) 

will imply that 

lim P(<, , ti ,n) = L. 


★The terms pi(k | n) which we shall introduce are simpler than the 
binomial probabilities px(k | n). However, they are not simple enough to 
compute the limit L directly. Therefore we shall repeat the device just 
described and introduce still simpler terms, say piik \ n), which have the 
property that, for every « > 0, there exists a number n*(«) such that the 
inequality n > ittie) implies 


Pi(fc I n) _ j 
Piik I n) 


< € 


for all values of k between the limits A{n) ^ k ^ Bin). The sum, say 
Pa(ti of terms p^ik j n) is simple enough to concludfe that it tends 

to a finite limit, 

Bln) 

lim Paiti , fa, n) = lim 22 Pkik | n) = L. 

n-♦o9 


★Then the application of the lemma of Duhamel guarantees that the 
same number L must be the limit of the sum Pi(t, ,tt,n) and, therefore, 
of P{jt\ , fa » 

★In order to obtain'the terms pxik ( n), we write the binomial probability 
Pxik I n) and replace the three factorials by their expressions, as provided 
by Stirling’s formula pven in subsection 4*5*1. We ^ve 



[ 4 - 6 - 4 ] 


PROOF 


241 




^/2lm — p)"" 


JjL 


k^e * y/2irke^^^{n — ky~‘^er'''^^ y/2Tr{n — A:) 

where , 62 and are unknown numbers between aero and unity. Upon 
canceling equal factors in the numerator and denominator and rearranging, 
the above expression can be written as 

6 1 9 a 9 » 


Px(k I n) = pi(k I n) ^ 2 * i 2 (n-fc) 


with 

(4-6* 17) p,(k\n) = 


Vnpil - p) V 2 . 


]_ (npY^( n(l - p) Y^*^ 
/^\kJ \n - k I 


★We shall show that the terms pi(A; | n) so defined satisfy the conditions 
of the lemma of Duhamel, namely, that whatever e > 0, there exists a 
number ni(e) such that for all n > rJiCc), 


P.v(fc I n) _ 
Piik I n) 


< « 


irrespective of the value of k between the limits A(n) ^ k ^ B(n), We 
have 

Pxjk I n) 12n I2fc I2(n-fc) 
pAk 1 n) 

and, because the B^s are between zero and unity, 

12* 12(n-ifc) ^ Px(k I n) i2n 

" ^ Piik\n)^^ 

Referring to (4-6-6) and (4*6-7) we find that, when A{n) ^ k ^ B{n)j 
then 


Vw [p Vn + <1 \/p(l - P)] < * 


and 


n 


k ^ n — B(n) > n — np — \/wp(l — p) 


= y/n[(i - p) y/n - t, Vp(l - p) ]• 

★It follows that for all values of k between the limits A{n) ^ k ^ B(n) 
we have 


12V'n[pV^+<iVp(l-p) 1 l2V'n[(l-p) V'^-*,Vp(l-p) ] Px(k | u) i2n 


( 4 - 6 - 18 ) 



242 


THEOREM OF LAPLACE 


[ 4 - 6 - 4 ] 

★As n ~>oo both extreme parts of this double inequality tend to unity. 
This is quite obvious for the extreme right part. In order to see that this 
is also true for the extreme left, notice that, when n is increased, the 
expressions in square brackets in the exponents will eventually exceed 
unity. At the same time the whole expression in the extreme left part of 
the inequality will become greater than 

_L- 

ev; 

e 


which tends to unity as n —><». Thus, for any given p, fi , (2 and € > 0 
with 0 < p < 1 , it is always possible to assign a number riiie) such that 
for n > Tiiie) the extreme right and the extreme left part of the double 
inequality (4-6* 18) will differ from unity by less than €. Then the same 
will be true for all the quotients px(k | n)/pi(fc | n) for A(n) ^ k ^ B(n). 
Q.E.D. 

★The terms pi(fc 1 n) are somewhat easier to handle than the binomial 
probabilities px(k | n) because of the absence of factorials. However, the 
sum of terms pi(k | n) extended from A(n) to B(n) is still too complicated 
for convenient evaluation of its limit. Therefore, we proceed to a further 
simplication. For this purpose we set 

(4-6-19) fc = np + Xk(n) \/np(l — p), 

so that to each n and k there corresponds a perfectly defined number Xk(n) 


(4-6-20) 


Xk(n) 


— k — np 
y/npiX - p) 


When k assumes all integer values from A{n) through B{n), then Xk{n) 
increases by equal steps of 

Xk(n) - **-,(») = —p===== = (say). 

Vnp(l - p) 

The length of the step tends to zero as n The first of the a:*(n) 
is ®A(n)(«), and, since A(n) denotes the smallest integer larger than 
np + ti \^np(l - p), 

(4-6-21) U < ~ ^ + A. . 

V»P(1 - p) 

Similarly, the last of the numbers a:*(n) is XB^in). Si nce BjrC) w as defined 
as the greatest integer which is less than np + \/np(l — p), it follows 
that 

(4-6-22) - A. ^ XbM = ^ • 

V«p(l - p) 

Formulae (4* 6 * 21 ) and (4’6*22) will be used later. 



[ 4 ’ 6 * 4 ] PROOF 

Using the numbers Xt(n), we shall write 


243 


p»(fcln) = 

and prove that the quotients pi(k | n)/p 2 (k | n) satisfy the condition of 
the lemma of Duhamel. For this purpose we substitute (4*6* 19) into the 
expression (4 • 6 • 17) and evaluate the logarithm of the reciprocal of pi{k | n). 
We have 

- log Pt(k \n) = log - log 

+ (np + *»(«) Vnp(l - p) + I) log ^1 4- **(n) -J— 

+ (nil -p) - x,in) \^npil - p) + |) log (l - Xt(n) 

Now we use Taylor’s formula, applying it to the evaluation of the two 
logarithms in the right-hand side. Specifically, using (4*6*1), we write, 
say, 

TTt = log pi(fc I n) - log y/27r + log A, 


= l^np + a:*(n) \4p(l - P) + 


np 


/I -p\ 

1 1 

\ np / 

(l+eMn)yJ 


+ 


j^n(l - p) - a:*(n) y/npil - p) + 


P) 


xKn) 


2 n(l — p) 


2 _f P \ 

— p) 3 \n(l — p)/ 


(l - _ 

where t>i and are two numbers known to be between zero and unity. 
The work required in the two multiplications indicated is considerably 
simplified if one combines the multiplication with sorting the terms ac¬ 
cording to the power of n they involve. Thus in each of the two products 
there will be exactly one term with n raised to the power one-half. These 
two terms are 

- p)®*W yln(i- pj = 





THEOREM OF LAPLACE 


244 . 


[ 4 - 6 * 4 ] 


and it is seen that they add up to zero. Further, in each of the two products 
there will be exactly two terms free from n. They are 

[~ Ja:|(n)(l — p) + «*(«)(! — p)] + {—hxl{n)p + x»(n)p] = 

As to the other terms in the two products, their denominators include n 
raised to the power one-half or higher. The total of these terms can be 
written as 


■\/n 

About the numerator Uk{n) it is easy to prove the existence of a positive 
number M{ti , < 2 , p) which depends on , < 2 , and p but not on n and k, 
such that, for sufficiently large value of n, 

(4*6-23) I Uk{n) | < M(t, , <2 , p) 

no matter what the value of k between the limits A{n) ^ k ^ Bin), 

Let T stand for the greater of the absolute values 1| and | < 2 1* Then 
I I ^ and this circumstance implies that every term included in 
Ukin) must be bounded. A possible doubt may concern the terms con¬ 
taining either 

(4.6.24) - - ^ rrt^V 7 - ^ . 

(1 -f (1 - 

In order to see that for sufficiently large values of n these terms are 
bounded, notice that 

1 +tf.x.(n) -y/S > 1 - r^f^> 

provided 

n > 4r* - -2 • 

P 

Thus, if n satisfies this inequality, then the first of the two expressions 
(4*6*24) is less than 8 and hence all the terms in Ukin) involving this 
expression as a factor will be bounded. A similar argument applies td 
terms in Ukin) which involve as a factor the second of the expressions 
(4*6*24). 

After establishing that (4-6*23) holds for sufficiently large values of n 
and for all k between the limits A(n) ^ k ^ Bin), we return to the ex¬ 
pression for Wk and write 

* Wk- - log piik I n) - log VSJr + log An = I xlin) + 

^ yn 



[ 4 ' 6 - 4 ] 

which gives 


PROOF 


245 


(4 •6-25) 


Pi(fc I n) 


A, -U\(n) 

—7= e e v» 

y/2ic 


= Piik \n)e ^ . 


From this we conclude that the terms psik | n) satisfy the lemma of 
Duhamel, namely that for every e > 0 there exists a number na(€) such 
that the inequality n > implies 


(4*6-26) 


V^{k I n) __ ^ 
P 2 {k I n) 


< € 


for all values of k between the limits A(n) ^ k ^ B(n), This follows from 
(4-6‘23) and from (4*6*25) because the two formulae imply that for 
sufficiently large values of n, 


(4-6-27) 


— MSJijlIj.lVI 

€ 


^ Pi(fe I w) _ 

p-Xk I n) 


e 


< e 


In order to satisfy (4«6*26) for all the requisite values of k it is sufficient 
to select n large enough so that the two extreme sides in (4*6*27) differ 
from unity by less than €. 

Using the lemma of Duhamel we conclude that, if it is established that 
the sum 

Bin) 

(4-6-28) P-iiti , ti ,n) = 23 Piik I n) 

k’-Ain) 

tends to a finite limit L as n is increased, then the same number L is the 
limit of the sum , ^ 2 , and, by another application of the lemma of 
Duhamel, also the limit of the probability P(ti y (2 y n). Thus, in order to 
complete the proof of the theorem of Laplace it is sufficient to show that 

(4•6-29) lim , t,, n) = lim 23 e" 

n-*co n-*co k’^A(n) -y/27r 





dx. 


We shall achieve this end by constructing special grids and computing 
specially devised Riemann sums which tend to the integral in the right- 
hand side of (4*6*29). It will appear that for each n, the sum P 2 (ti ,t 2 ,n) 
differs from the corresponding Riemann sum by a quantity which tends 
to 2 ero as n —^ 00 . • 



246 THEOREM OF LAPLACE [ 4 ’ 6 ‘ 4 ] 

Consider any n and the corresponding sum Pa(<i , n). Let m(n) de¬ 

note the niunber of terms in this sum. Obviously, 

m(n) — B(n) — A(n) -{■ 1. 

Denote by G[m(n)] the grid composed of m(n) cells, extending from f, to 
fa and defined as follows. The first point of the grid is aa(n) = h . The 
next point is Oi(n) = ao(n) + A, whe re A„ has the meaning defined pre¬ 
viously, namely A„ = l/-\/»p( 1— p). Generally, let the {k -|- l)st point 
of the grid be 


a*(n) = at-i{n) -1- 

for all values ofk = 1, 2, • • • , m(n) — 1. The last point of the grid G[m(n)] 
willbeo„(„,(n) = <a • 

According to this definition, the first m(n) — 1 cells of the grid have 
the same length A„ . The length of the last cell, say S(n), may be equal to, 
smaller than or greater than A.. In fact 

J(n) = fa - a»(«)-i(n) = fa - fx - [m(n) - l]An 


= it - ti - [B(n) - A(n)]An . 

In order to obtain limits for [B{n) — A(n)]A, , we recall the 
meaning of A„ and use formulae (4•6-21) and (4'6'22) and obtain 
fa — f, — 2A„ ^ (B(n) — A(n))A„ < fa — f, . 

It follows that 


0 < S(n) ^ 2A,. 

Thus, when n —*<*>, the length of the last cell in G[m(n)] also tends to 
zero. We refer to subsection 4-6-1 and conclude that the grids G[m(n)] 
constructed for ever-increasing values of n may serve for the computation 
of Riemann sums and, in whichever way we choose the designated points 
within each cell, the corresponding Riemann sums will tend to the integral 
in the right-hand side of (4-6'29). Using this freedom of selecting desig¬ 
nated points, we notice that x^(,)(n) lies within the first cell of the grid 
G[m(n)]. This follows from (4-6-21). Thus we select <«)(») as the desig¬ 
nated point of the first cell. Further, we recall that the points a:»(n) of 
(4*6*20) are all equidistant and that the interval between them is equal 
to A. , which is also the length of each of the m(n) — 1 first cells of the 
grid. It follows that the points Xt(n) are distributed so that one is in each 
of the m(n) cells of the grid and namely that 

o*_i(n) < g o*(n), 

for ft = 1, 2, • • • , m(»). Accordingly, we choose the point XA.M+t-i(n) as 



PROOF 


[ 4 ' 6 - 4 ] 


247 


the designated point of the cell (o*_i(n), a*(n)) and build a Riemann sum, 
say PaHi , ta, n). Obviously, 




B(n)-1 

<2, n) = Y, 


*-A(n) 


'\/2v 


+ 5(w) 


‘\/2v 


Since the maximum length of the cell of the grid underlying the Riemann 
sum ^2(^1 ,t 2 fn) tends to zero, it follows that 


(4-6-30) 


lim y (2 yu) = / ' - -- dx. 

n-cD Jtt '\/2 t 


However, comparing , (2 , n) with P^iU , (2 , n), we find that 


I P 2(^1 j ^2 ) n,) — ^2(^1 > 4 > n) I = I A„ 5(n) | 

V 27r 


^ — ■■■ — • 

\/27rnp(l — p) 

Thus, as n — > 00 , the difference between , ^2 , and PJCfi , ^2 , 
tends to zero. Therefore, if p 2 (^ , ^2 , ^) tends to the integral (4*5-30), 
then P 2 (<i , <2 , ^0 must tend to the same integral also. This completes 
the proof of the theorem of Laplace. 

PROBLEMS AND EXERCISES 

Consider a certain liquid in a container C. There are N bacteria in this 
liquid. The liquid will be vigorously shaken and then part of it will be 
transferred into a test tube. Let Vo denote the volume of the liquid in 
the container, and let Fi < Vq denote the volume of the part which will 
be poured out into the test tube. 

It is generally accepted that the probability p that any given bacterium 
will be transferred into the test tube is equal to the ratio of the volumes 
Vx/Vq . Moreover, it is also generally accepted that the appearance of 
one particular bacterium in the test tube is completely independent of the 
appearance of the iV" — 1 other bacteria. In other words, it is ordinarily 
postulated that the machinery behind the transfer of a certain number of 
bacteria into the test tube is equivalent to a sequence of N completely 
independent trials with the probability of ‘‘success” in each trial equal to 
V = yjVo . 

In the following problems it will always be assumed that AT is a very 
large number, in thousands or in millions. The letter m will be used to 
denote the “bacterial density” in (7, that is, m is the average number of 
bacteria in C per unit volume, m = N/Vo . 

1. Assume that the volume Fi of 3 cc. poured into the test tube is very 
small compared to Vo , so that p = Fi/Fo is a small number, perhaps 



248 


PROBLEMS AND EXERCISES 


[ 4 - 6 - 4 ] 

p = .0001. Assume further that the bacterial density m is only moderately 
large, perhaps m = 2.5 bacteria per cc. Compute the approximate value 
of the probability that the number X of bacteria in the liquid will be 
fc == 0, 1, 2, etc. 

Partial Answer-. = fc} =5= . 

2. Assume that the volume Vi poured into the test tube is comparable to 
the original volume Vq , so that p is not very small, perhaps p = .25. Let X 
be the number of bacteria transferred into the test tube and Y = X/Vi the 
resulting bacterial density in the tube. Compute a function of N repre¬ 
senting the approximate value of the probability P{|y — m| < t\ N} 
where r is a given number. Specifically, put p = .25, t = .01 and N = 
100,000. 

1 r+< z£! 

Partial Answer: —7= / e ^ dx, with t = t 
y/2v J-t 

3. Consider the case when the bacteria in question are toxic, so that if 
an experimental animal (e.g., a mouse) receives an injection containing a 
certain number t (or more) bacteria, then it dies. Assume that the amount 
of liquid injected is the unit volume. 

Under the conditions of Problem 1 compute the probability, say P{tj m), 
of the death of a mouse assuming t = 1, 2, 3. Substitute for m = 8, 4, 2, 1, 
.5, .25, and .125. For each value of t make a graph of the probability P{t, m) 
considered as a function of m. 

4. Under the general conditions of Problem 3, assume that 10 mice are 
to be injected with a unit volume of liquid with bacterial density m on the 
average. Denote by X{t, m) the number of deaths among the ten mice. 
What is the distribution function of X{ij m)? Compute the most probable 
value of X(tj m) if i = 1,3, and m = 8,2, .5, and .125. Make plots separately 
for ^ = 1 and i = 3. 

5. While working with some unknown toxic bacteria, experimenters in¬ 
jected a large number N of mice each with the same dose of liquid con¬ 
taining these bacteria. The resulting death- rate was .95. This high death 
rate may be ascribed to either of two sets of circumstances: (i) the liquid 
injected was of relatively low bacterial density, but the toxicity of each 
bacterium was very strong; (ii) the liquid injected had a high concentration 
of mildly toxic bacteria. Inspect the curves constructed in solving Prob¬ 
lems 3 and 4, and suggest an experiment which may help to distinguish 
between the situations (i) and (ii). 

6. Let Rq be a large region in the sky away from the Milky Way and 
let Rx be a region partial to R ^. It is ordinarily assumed that, if there are 
N visiljle stars in the region R^ , then the probability of there being exactly 




PROBLEMS AND EXERCISES 


[ 4 - 6 - 4 ] 


249 


k stars in the smaller 



region Ri is given by the binomial formula, 


(a) Let Ro be a 10® X 10® region containing N = 10,000 bright stars. 
Write down a formula representing the approximate value of the prob¬ 
ability that a region Ri with dimensions 10' X 10' will contain exactly k 
stars. 

(b) A photograph represents the 10' X 10' region. What is the prob¬ 
ability that at least three of the bright stars will appear on this photograph? 

(c) How large a photograph is required to insure that the probability of 
at least one bright star on the photograph is larger than .90? 

(d) If I take a photograph of the entire region and divide it up into 
1' X 1' cells, then what is the probability that at least one of these cells 
contains more than one star? Hint: to compute this probability use the 
direct method, starting with the definition of the F.P.S. 

7. A new method of diagnosis of cancer of the lung is based on examina¬ 
tions of specimens of sputum. Treating the problem in a very simplified 
manner, assume that the probability that a sputum of a patient actually 
ill with cancer of the lung contains cancerous cells is, say, = .5. Assume 
further that, given that a sputum contains cancerous cells, the probability 
that some of these cells will be transferred onto a slide is p 2 = .3. Finally, 
assume that, given that a slide contains some cancerous cells, the prob¬ 
ability that these cells will be identified by the pathologist is ps = .95. 

(a) The doctor orders a patient to deliver r samples of sputum ex¬ 
pectorated on n different occasions. From each of these specimens there 
will be made s slides, and each slide Avill be inspected twice. Assume that 
the patient does suffer from cancer of the lung, and write the probability 
P(r, s) that at least one of the 2 rs readings of the slides will lead to the 
verdict ‘^positive." 

(b) A doctor wonders how to determine the values of r and s so as to 
keep the amount of labor constant and to increase the probability of 
detecting cancer if it exists. In particular, he considers three systems 
(i) r = 1, s = 15; (ii) r = 3, s = 5 and (Hi) r = 15, s = 1. Which of these 
systems is preferable? 


REFERENCES 

1. A. N. Lowan (Director), Tables of Probability FunctionSj Vols. 1 and 2. Federal 

Works Agency, Work Projects Administration for City of New York, National 
Bureau of Standards, 1941. 

2. E. C. Molina, Poisson^s Exponential Binomial Limit, New York; Van Nostrand, 1942. 



CHAPTER V. 


Elements of tHe Theory of Testing 
Statistical Hypotheses 


5*1. Statistical hypotheses and their tests 

5*1-1. Basic ideas. In the first chapter of this book we introduced the 
distinction between problems on probability in general and the particular 
category of these problems that are the subject matter of mathematical 
statistics. 

When we begin with assumptions concerning the probabilities of A, 
jB, • • • , C and deduce from them the probability of D, then, whatever be 
the meaning of the letters A, and D, the problem treated is that of 
probability. Mathematical statistics is concerned with the subclass of these 
problems that relate to rules of inductive behavior. With a few exceptions, 
all problems discussed in Chapters 2 through 4 are probabilistic problems, 
without any reference to rules of inductive behavior. The concepts and 
methods developed in these chapters will now be used to introduce the 
student to problems of mathematical statistics. The most frequent of these 
problems is that of testing statistical hypotheses. Let X stand for a random 
variable capable of assuming the values \ 

Ui < U2 < < Un 


and no others. As formerly, we will use the symbol px(t) to denote the 
frequency function of the random variable X. In this chapter we shall be 
particularly concerned with situations where (1) some trials or observations 
will yield the values of one or more random variables and where (li) the 
frequency functions of these random variables are unknown. The random 
variables whose particular values will be determined by observation are 
described as observable random variables. 

DEP^NmoN 5-1. ^very assumption concerning the frequency functions of 
^observame random vambles is called a statistical i 


Th6 Concept of statistical hypothesis is very important and we will 
illusttaffe it with several examples. 

Exhmple 1. Consider the hypothesis, say £1*/ , asserting that there is 
i^e planet Mars. Stated in this form, Hat is not a statistical hy- 
; because it is not an assumption concerning the frequency function 
andom variable. 



250 



BASIC IDEAS 


251 


[ 5 - 1 - 1 ] 

# ■ • 

Example 2. After the discovery of a new comet C, the hypothesis Hp is 
made that the comet^s orbit is a parabola. To test this hypothesis a series 
of measurements of the comet’s coordinates will be made. Let Xx , X 2 , 

• • • , Xn denote the results of these measurements. For the purposes of 
this example it will be assumed that the measurements are made with 
absolute accuracy. 

The hypothesis Hp is concerned with the observable variables Xx , 
X 2 , • * • j Xn \ yet the description of the situation does not imply that 
these variables are random variables. On the contrary, it implies that each 
Xi is a perfectly determined sure function of the time, and the hypothesis 
Hp specifies the nature of this function. Thus, Hp is not an assumption 
concerning the frequency function of the observable random variables and, 
therefore, is not a statistical hypothesis. 

. However, Hp can be transformed into a statistical hypothesis. One way 
of achieving this is to admit that the measurements relating to the position 
of the comet are subject to error and to postulate that these errors have 
a random character. As a result, the variables Xi , X2 , • • • , Xn will be 
observable random variables, and the assumption Hp that the true co¬ 
ordinates of the comet correspond to a parabolic orbit will become a 
statistical hypothesis. 

Example 3. We shall toss a coin ten times, taking particular care that 
all the tosses are performed in exactly the same manner and that the 
outcome of one toss does not influence the outcomes of the following tosses. 
The letter X will denote the number of tosses yielding ‘^heads.” 

Under the above conditions it is usual to postulate that X is a binomial 
random variable capable of assuming the eleven integer values from zero 
to ten. The frequency function Px{k) is generated by the expansion of the 
binomial (q -f puY^^ where p = 1 — q stands for the probability of the 
coin falling heads in any particular toss. 

Ten tosses actually performed will determine a particular value of X. 
Thus X is an observable random variable. If our information about the 
tosses is limited to what was just described, then the value of the prob¬ 
ability p is unknown and every assumption about its value constitutes a 
statistical hypothesis. In particular, we may consider the following sta¬ 
tistical hypothesp: 

^ Hx asserts that p = 

7/2 asserts that ^ ^ p ^ f, 

etc. It will be observed that the postulates underlying Example 3 imply 
a perfectly determined form of the frequency function 

(5.M) Px(A:) = CtopV* 

of the observable random variable X. The only thing that is subject to 



252 STATISTICAL HYPOTHESES 

doubt is the value of the parameter p which may conceivably be any 
number between zero and unity. In these circumstances every statistical 
hypothesis concerning Example 3 must reduce itself to a more or less re¬ 
strictive assumption concerning the value of p. Example 4 illustrates a 
situation of somewhat more general character. 

Example 4. Consider again a set of ten tosses of a coin, with X standing 
for the number of heads. However, in this Example nothing will be assumed 
as definitely known about the method of tossing except that the outcome 
of each toss is random. In particular, we will admit the possibility that 
the method of tossing may vary from one toss to another with a resulting 
change in the probability of the coin falling heads. In these circumstances, 
the only thing that is known about the observable random variable X is 
that it is capable of assuming integer values from zero to ten and we may 
consider the following statistical hypotheses: 

Hg asserts that X is a binomial variable with frequency function (5 • 1 • 1). 

asserts that X is a binomial variable with frequency function (5 • 1 • 1) 
and p — \- 

Hu asserts that X is a binomial variable with frequency function (5* 1 • 1) 
and with p between the limits | ^ p ^ f. 

Hfi asserts that the probability of X exceeding 3 is greater than one-half, 
so that P{X > 3} > ^. 

It will be observed that the postulates underlying Example 4 do not imply 
any particular form of the frequency function of the observable random 
variable. Thus in this situation there is room for the statistical hypoth¬ 
esis Ha asserting that the frequency function Px{k) is given by (5-1-1). 
However, a statistical hypothesis may be even more general than and, 
as is the case with , may be restricted to a very mild assumption con¬ 
cerning px(k) with its functional form unspecified. 

When simultaneously discussing several statistical hypotheses concerning 
the same random variable X, it is convenient to include in the notation 
for the frequency function a reference to Jbhe particular hypothesis by 
which this frequency function is defined. We will use the symbol px{k j H) 
to denote the frequency function of X as determined by the hypothesis //. 
With this notation we have 

Pxik I HO = pjcik I HO = Cfo®"* 

for & = 0, 1, 2, • • • , 10 and 

Pxik 1 HO = P:c{k \H0=0 

for all ether values of the argument k. Also, the non-zero values of the 



BASIC IDEAS 


[ 5 - 1 - 1 ] 


253 


frequency function px{k | H 2 ) are represented by formula ( 5 * 1 - 1 ) with p 
between the limits \ ^ p ^ 

It will be noticed that the values of the frequency function px{k | Hi) 
are perfectly determined for all values of the argument k. On the other 
hand, the frequency function of X as specified by the hypothesis H 2 is not 
perfectly determined. For example, for fc = 0 


px(0 I H 2 ) = 


may have any numerical value between the limits 


^ px{o\h2) ^ ( f )^ 


This is an important distinction between the hypotheses Hi and ifa . It 
leads to the following definition 


Definition 5*2. 1^1 f a statistical hypothesis H entirely specifies the fre¬ 
quency function of the observable random variable {or variables) as a single¬ 
valued function of its argument (arguments) j then H is called a simple sta¬ 
tistical hypothesis. Every statistical hypothesis which is not simple is called 
composite!^ 


Of the hypotheses discussed in the foregoing examples, hypotheses Hi 
and H 4 are simple. On the other hand, hypotheses H 2 , //» , H^ , and i/o 
are composite. 

The term ‘‘simple hypothesis” is very intuitive and does not require 
special justification. To justify the term “composite” applied to any 
hypothesis which is not simple, we notice that, if a hypothesis H is not 
simple, then it is “composed” of several simple hypotheses, frequently of 
an infinity, and is their logical sum. For example, the hypothesis H 2 is 
the logical sum of all simple hypotheses ascribing to p arbitrary particular 
values between | and f, such as p = .500, p = .499, p = .501, etc. 

Also, hypothesis i /3 is the logical sum of simple hypotheses such as the 
following: X is a binomial variable arid p = .5; or X is a binomial variable 
and p = .25; or X is a binomial variable and p = .75, etc. It is seen that 
each of the simple hypotheses just stated is obtained from i /3 by adding 
to H 2 a supplementary assumption concerning the value of p. 


Definition 5*3. We say that a simple hypothesis h belongs to the com¬ 
posite hypothesis H if h can be obtained from H by adding to H supplementary 
assumptions which do not contradict H. 

In the meaning of this definition, hypothesis Hi of Example 3 “belongs” 
to hypothesis H 2 • In fact. Hi can be obtained from i /2 by adding to i /2 
the supplementary assumption that p = |. It is obvious that every com¬ 
posite hypothesis is the logical sum of the simple hypotheses belonging to //. 

The relation of hypothesis Hi in Example 3 and of hypothesis 7/4 in 



254 STATISTICAL HYPOTHESES [5*1*1] 

Example 4 requires special attention and comment. It will be seen that -ff 4 
contains two uncertain assumptions: (i) X is a binomial variable and (ii) 
p = i. On the other hand, the only uncertain assumption in Hi is (ii) p = 
Yet, as we have seen above, both hypotheses Hi and H^ ascribe to the 
observable random variable X the same frequency function. In these cir¬ 
cumstances a convention is necessary whether to consider Hi and H^ as 
identical hypotheses or not. We shall adopt the convention that a statistical 
hypothesis is identified by the properties of the distribution it ascribes to 
the observable random variable or variables, irrespective of elements of 
uncertainty involved. With this convention. Hi and H^ are identical sta¬ 
tistical hypotheses. Also hypotheses H 2 and 5 are identical. The difference 
in the uncertain assumptions these hypotheses contain must be accounted 
for by using a new concept. This new concept is that of the set of admissible 
simple hypotheses. 

In Examples 3 and 4, the formulation of the statistical hypotheses was 
preceded by a general description of the situation leading to the adoption 
of several postulates. The omnipresent postulate, without which no prob¬ 
abilistic problem relating to actual phenomena is possible, is that the 
observable variables are random variables. Apart from this, however, the 
description of the situation in both Examples led us to accept as granted, 
or to postulate, that the variable X can assume only the integer values from 
zero to ten and no others. Moreover, the conditions of Example 3 caused us 
to postulate in addition that the observable random variable is a binomial 
variable. 

It will be noticed that the adoption of similar postulates, additional to 
the basic one that the observable variables are random variables, is equiv¬ 
alent to the definition of a set of simple hypotheses which in any particular 
case are ^‘admissible,” In the discussion of tests of statistical hypotheses, 
the sets of admissible hypotheses play an important role, so that it is 
essential to define these sets with precision. 

A set of admissible hypotheses is usually denoted by the Greek letter 
capital omega 12, if necessary supplemented with subscripts. The sets of 
simple hypotheses, say 12i and 122, which are considered admissible in 
Examples 3 and 4, respectively, can be defined explicitly as follows: 

121 is made up of all simple hypotheses which (i) assert that X is a bi¬ 
nomial variable with possible values 0, 1, 2, • • • , 10, and (ii) ascribe 
arbitrary values 0 ^ p ^ 1 to the probability p. Roughly speaking, there 
are as many simple hypotheses in the set 12i as there are numbers between 
zero and unity. 

122 is made up of all the simple hypotheses h which (i) assert that X 
cannot assume values other than fc = 0, 1, 2, • • • , 10, and (ii) ascribe 
arbitrary non-negative values to the frequency function of X 


Px(k I h) == ak(h) for A: = 0, 1, • • • , 10 



[ 511 ] 

with only the restriction 


BASIC IDEAS 


255 


10 

E Mh) == 1. 

As-O 

It is obvious that the set O2 includes Oi . On the other hand, there are 
simple hypotheses which belong to Q 2 but not to . Such, for example, is 
the hypothesis h' which asserts that 

(px(k I h') = for fc = 0, 1, ••• , 9, 

(5-1-2) \ 

Ipx(10 I /i') = 

It is easy to see that if h' is true, then X is not a binomial variable. In fact, 
formulae (5* 1-2) imply that the quotient Qa: = px(k | h')/px{k — 1 | /i') = ^ 
for /? = 1, 2, • • • , 9. However, in subsection 4•2*3, it was proved that 
the quotient Qk for the binomial variable is necessarily a decreasing func¬ 
tion of /c. Therefore, if h' is true, then X is not a binomial variable. Conse¬ 
quently, the simple hypothesis /?/ docs not belong to the set Qi . 

Using the concept of the set of admissible simple hypotheses we can now 
say that the difference in the uncertain assumptions involved in the same 
hypothesis Hi = considered in Example 3 and in Example 4 is due to 
the fact that the sets of admissible simple hypotheses contemplated in these 
examples are different. The same applies to the hypothesis H 2 = //s. 

The last concept to be introduced in this subsection is that of the sample 
space. When discussing observable random variables it is often convenient 
to use geometric language and to interpret the values of the random 
variables as coordinates of points. 

First, consider the case of just one observable random variable X. 
Let Ui < U 2 < • • • < Un be all the values which X can assume so that 
an observation of X may yield any one of these values and no others. 
Then, to interpret geometrically the possible outcome of an observation on 
X we need just one axis of coordinates, say Oxy on which n points with 
abscissae Ui , U 2 , • • • , i/« can be marked. The observation of X will de¬ 
termine one of these points, say X = Uk • This point will be described as 
the ^^sample paint” or alternatively as the ^^event point.” The n points 
with abscissae ^1,^2, • • • , are the possible positions of the sample 
point, or simply, the possible sample points. The set of these points is called 
the sample space of the variable X. 

The three terms, sample point, possible sample point, and sample space, 
are used irrespective of the number of variables considered. The sample 
point is ordinarily denoted by E ( = event point), a possible position of 
the sample point is denoted by e and the sample space by W. Figure 28 
illustrates the concept of the sample space of two variables. 

Let Xj and X2 be two random variables whose particular values will be 



256 STATISTICAL HYPOTHESES [S’l’l] 

determined simultaneously by a trial or observation. Let Xi be capable 
of assuming any one of Ui different values, say, 

(5*1 *3) Wll ^ ^ • • • • ^ Wim • 

Similarly, let X2 be capable of assuming any one of rig different values, say, 
(5* 1*4) U 2 I < U 22 < -< . 

The observation will 5deld particular values of each variable, say Xi = Un 
and X 2 == U 2 i . These values, then, determine the sample point E, To 



Figure 28. Sample Space W of Two Observable Ilandom Variables X\ and Z 2 . 


interpret the result of the observation geometrically, it is convenient to 
use two axes, say, Oxi and 0^2 as in Figure 28. On the axis Oxi we mark 
all the possible values (5* 1 - 3) of Xi . On the axis Oxz we mark the possible 
values (5- 1-4) of X 2 . The heavy dots in Figure 28 mark points with co¬ 
ordinates (uu y Uii) for i = 1, 2, • • • , ni and jf = 1, 2, • • • , r?2 . These 
points are the possible sample points e, and the set of these points is the 
sample space W of the two random variables Xi and X2. It will be observed 
that the sample space W is composed of nin2 possible sample points. 

The generalization of the concepts of sample point and sample space for 
the case of more than two observable random variables is easy. Let there 
be s, random variables, Xi , X 2 y • • • , X,- , • • • , X, , and assume that a 
trial or •a system of observations will determine the values of all of them 



[5*1*1] PROBLEMS AND EXERCISES 257 

simultaneously. Let rii stand for the total number of different values, say 

Uix < Ui2 < •••• < Uint 

which may be assumed by Xi , for i = 1, 2, • • • , s. Then the observation 
will yield s numbers, say 

^1 ^lai j X2 ^2ot > * * * > ^iai 9 * * * J J 

where Ui is any one of the numbers 1, 2, • • • , rzi , and Ug is any one of the 
numbers 1,2, • • • , ng, etc. The particular values of the so determined 
are again described as coordinates of the sample point E. The number of 
possible sample points e is equal to the product riiriz • • • n, . The set of 
all possible sample points is called the sample space of the random variables 
Xi , X2 , • • • , Xa and denoted by W. The frequency function of the vari¬ 
ables , X2, • • • , A"* taken at a possible sample point e will be denoted 
simply by | //), etc. 

The concepts introduced in this subsection are illustrated in numerous 
examples discussed below. 

PROBLEMS AND EXERCISES 

1. A plot containing n grasshoppers is subject to a poison spray. We 
assume that x, the number that survive, is a random variable, (a) What 
is the sample space of a;? Plot this sample space, (b) Suggest at least two 
composite hypotheses which partially specify the distribution of x. (c) 
Write down simple hypotheses which belong to the composite hypotheses 
of (b). 

2. In a sideshow of a country fair a gambler has a deck of 10 cards of 
which he asserts 2 are aces. Each player has an opportunity to draw a 
card successively; the deck is thoroughly shuffled between draws. If he 
draws an ace, he gets his money back plus four times as much. An onlooker 
doubts that there are 2 aces: as a check he observes the results of ten suc¬ 
cessive games, (a) What is the observable random variable? What values 
may it take? (b) What is the frequency distribution of the random variable 
if the number of aces remains constant throughout? Is this a simple or a 
composite hypothesis? (c) What is the frequency distribution if the 
gambler^s assertion is correct? (d) Write down another possible frequency 
distribution of the random variable. 

3. A manufacturing concern tests the strength of 5 of each incoming 
100 bottles of acid. Let x denote the number of bottles in the sample below 
strength, (a) Write down the values the random variable x can take. What 
is the sample space? (b) Write down a composite hypothesis specifying 
the frequency function of x. Write down a simple hypothesis belonging to 
this composite hypothesis. 

4. Each day the day-shift assembly line produces 25 vacuum cleaners, 
the smaller evening ^hift 10. Of these, 8 of those produced on flay shift. 



258 



4 on evening shift are given a complete overhaul inspection job. Let x = 
number of cleaners produced by the day shift which are rejected by this 
thorough inspection, y the number of the evening-shift production which 
are rejected, (a) Graph the sample space and indicate what values {x^y) 
may take, (b) Suggest a few hypotheses which specify the frequency 
distribution of {x,y). 


^ *1*2. Tests of statistical hypotheses. vThe problem of testing a statistical 
hypothesis occurs when circumstances force us to make a choice between 
two courses of action: either take step A or take step B, with no other 
course of action contemplated. Moreover, in order to speak of a test of 
a statistical hypothesis, it is necessary that the desiraW^tv)of actions A 
and B depend on the frequency function p(6) of son^j^tj^servable random 
variables and that p(e) be uncertai^ 

Imagine, then, one or more observable random variables which we will 
denote by just one letter X. Assume that the frequency function VxiA is 
unknown and let Q denote the set of admissible simple hypotheses con¬ 
cerning px(e). The set Q falls into two parts which we denote by coa and 
cob , respectively. The subset cox is made up of all simple hypotheses such 
that whenever one of them is true, then the preferred action is A rather 
than B. The subset wb = 0 — wx is made up of all other admissible simple 
hypotheses. That is to say, if Px{<^) is of the nature described by one of 
the simple hypotheses belonging to wb , then our preferred action is B. 
It follows then that, if Vx{e) were known, then the preferred course of 
action A or B would be determined unambiguously. Since Px{^) is not 
known and may be any one of the functions specified by the admissible 
hypotheses forming the set S2, the best we can do is to base the choice of 
action on the values of the variables X as determined by observation or, 
in other words, on the position of the sample point E in the sample space W. 


Definition 5*4. I In the above circumstances, any rule R prescribing that 
we take action A whenihe sample point E determined by observation falls within 
a specified category of points, and that we take action B in all other cases, is a 
test of a statistical hypothesis, \ 


To understand this definition, it is essential to establish its relation to 
various concepts already introduced. In Chapter 1 we discussed the concept 
of rule of inductive behavior. The term ‘‘rule of inductive behavior^' was 
introduced with reference to situations where the desirability of the several 
actions contemplated depends on the nature of the frequency function of 
some observable random variables. This term was used to describe any 
rule for choosing an action in accordance with the particular values of 
these random variables determined by observation. It follows that a test 
of a statistical hypothesis is a rule of inductive behavior. 

In the situations discussed in Chapter 1, we contemplated a choice 



CONCEPT OP TEST 


259 


[5-1-2] 

among several possible courses of action. For example, there were three 
possible actions considered in the problem of Chevalier de M4r6. There 
are situations where we must choose among an infinity of possible actions. 
From Definition 5*4 it follows that in order to speak of a test of a statistical 
hypothesis, the number of possible auctions must be two. Thus we have the 
following definition which is equivalent to Definition 5*4. 

/ 

Definition 5 • 5. (^hen the frequency function px{'^) is uncertain and 
when there are only two actions contemplated, the desirability of which depends 
on the nature of Pxio), then every rule of inductive behavior which determines 
the choice between the two possible actions, in accordance with the observed 
values of X, is called a test of a statistical hypothesis^ 

Let H sta^ for the logical sum of all the simple hypotheses belonging 
to co^ , and H for ^e logical sum of all the hypotheses belonging to (a b • 
Obviously H and // are statistical hypotheses. If the subset wa contains 
more than one simple hypothesis, then H is composite. If wa contains just 
one simplejiypothesis, say h^ , then H coincides with /^a and is simple. 
Similarly H may be simple or composite according to the number of 
simple hypotheses contained in • 

When the true hypothesis about the frequen^ function of X belongs 
to coa , then say that H is true. Otherwise H is true and H false. In 
consequence, H will be described as the negation of H and vice versa. 
The choice between the two actions A and B is i^erpreted as the adoption 
or the acceptance of one of the hypotheses H ox H and the rejection of the 
other. Thus, if the application of an adopted rule of inductive behavior 
leads to action A, we say that the hypothesis H is accepted (and, therefore, 
// is rejected). On the other hand, if the application of the rule leads to 
action B, wo say that the hypothesis H is rejected (and, therefore, the 
hypothesis H is accepted). Frequently it is convenient to comientratc our 
attention on a particular one of the two hypotheses II and H, To do so, 
one of them is called the hypothesis tested* The outcome of the test is 
then reduced to either accepting or rejecting the hypothesis^tested. Plainly, 
it is immaterial which of the two alternatives H and H is labeled the 
hypothesis tested. However, there is a useful convention on this point 
which is discussed in subsection 5 • 1 • 3. 

The terms “accepting^' and ‘‘rejecting’^ a statistical hypothesis are very 
convenient and are w.ell established. It is important, however, to keep 
their exact meaning in mind and to discard various additional implications 
which may be suggested by intuition. Thus, to accept a hypothesis H 
means only to decide to take action A rather than action B, This does 
hot mean that we necessarily believe that the hypothesis H is true. Also, 

*The alternative term is the null hypothesis. However, the original term “hypothesis 
tested” seems more descriptive. • 



260 


STATISTICAL HYPOTHESES 


[5-1-2] 

if the application of a rule of inductive behavior ‘‘rejects’^ //, this means 
only that the rule prescribes action B and does not imply that we believe 
that H is false. 

Example. To illustrate the concept of a test of a statistical hypothesis, 
we will return to the problem of Chevalier de M6r6 playing with doubtful 
dice.* Contrary to the assumptions made in Chapter 1, we will assume 
that Chevalier de M6r6 intends to choose between only two actions: 
either (aO to bet on ‘^double six’’ or (ua) to bet against '^double six.” 
The choice between these two actions is to be based on the outcomes of 
three games which Chevalier de M6r6 is supposed to witness before making 
his bets. The observable random variable X is the number of games in 
which *‘double six” appears at least once. It is assumed that the three 
games witnessed and also the following games are played with the same 
dice and are completely independent, with the same, though unknown, 
probability P that in the course of a game “double six” will appear at 
least once. With these assumptions, X is a binomial variable with frequency 
function 

(5-1.5) pxQc) = ClPXl - for fc = 0, 1, 2, 3. 

If there is no knowledge about the value of P, then the set of admissible 
simple hypotheses includes every hypothesis asserting that px{k) has form 
(5.1.5) and ascribing some specific value between zero and unity to P, 
0 ^ P ^ 1. 

Obviously, it is in the interest of Chevalier de M6r6 to bet on “double 
six” when P > ^ and to bet against “double six” when P < J. If P = 
then there is no preference between the two ways of betting so that he 
may as well decide arbitrarily to bet on “double six.” Thus, the set of 
admissible hypotheses falls into two parts, say and wg . The subset 
coi includes all hypotheses ascribing to P values | ^ P ^ 1. The other 
subset C02 is composed of all hypotheses ascribing to P values 0 ^ P < f. 
If the true hypothesis is any one of those belonging to coi , then the pre¬ 
ferred action is Oi . Otherwise, it is Uj . 

Let the letter ^stand for the logical sum of the simple hypotheses be¬ 
longing to wi and H for the logical sum of the simple hypotheses belonging 
to 0)2. Thus H is the hypothesis that asserts that P ^ ^is the negation 
of H and asserts that P < J. In this case both H and H are composite 
hypotheses. The choice between actions Ui and may now be interpreted 
as the adoption of one of the hypotheses H or H and the rejection of the 
^ other. Either H or H may be called the hypothesis tested. The outcome 
^ of the test reduces, then, to either accepting or rejecting the hypothesis 
tested. 

*See su^ction 1*3.3, p. 6. 



[5*1'3] TWO KINDS OF ERROR 261 

Errors in testing hypotheses. Resuming the general considera- 
^ons of the preceding subsection, consider a hypothesis H and its negation 
H and assume that action A is preferable to B in all ca^s when H is true, 
and that action B is preferable to A in all cases when H is true. Let R be 
some rule of inductive l)ehavior which uniquely determines which of the 
two actions A or R to take in accordance with any possible outcome of 
the observation. 

Obviously there are four possible situations resulting from the applica¬ 
tion of rule R. 

I* Hypothesis H is true (and, therefore, H is false), and the action 
taken is A. _ 

II. Hypothesis H is true (and* therefore, H is false), and the action 
taken is A. 

III. Hj^pothesis H is true (and therefore, H is false), and the action 
taken is B. ___ 

IV. Hypothesis H is true (and therefore, H is false), and the action 
taken is JS. 

These four situations are represented conveniently in a four-fold table as 
follows: 





True Hypotheses 

H 

H 

Action taken 

Description of Situation 

A: accept H 

satisfactory 

error 

B; reject H 

error 

satisfactory 


It will be seen that out of the four possible situations, two are satisfactory 
and the other two are not. In fact, in situation II the action taken is A 
(= acceptance of the hypothesis H) while the preferred action is /?( = 
rejection of the hypothesis //). Situation III is the reverse: the preferred 
action is A while the action taken is B. In cases II and III we say that 
the application of rule R results in an error. It is essential to notice that 
there are two different kinds of error possible. The adoption of hypothesis H 
when it is false is an error qualitatively different from the error consisting 
of rejecting H when it is true. This distinction is very important because, 
with rare exceptions, the importance of the two errors is different, and 
this difference must be taken into consideration when selecting the ap¬ 
propriate test. 

To illustrate this point, we begin with an exceptional case^when the 






262 TESTS OF HYPOTHESES [5*1’3] 

two kinds of errors are of exactly the same importance. A situation of this 
kind occurs in the Example in subsection 6* 1*2. In fact, betting against 
^‘double six^^ while the probability of double six is, say, P = .6 is just as 
undesirable as betting on “double six^' when the probability is P = .4. 
Therefore, in this particular example, there is no difference in the im¬ 
portance of avoiding the two kinds of errors which may be committed in 
testing hypotheses H and H, The following two examples illustrate situa¬ 
tions where the importance of these errors is different. 

Example 1. In this example we will imagine Chevalier de M6r6 acting 
as a host to a party where gambling with dice is the principal entertain¬ 
ment. In this capacity de M4r6 decides not to gamble himself and to dis¬ 
courage possible attempts at cheating. For this purpose he proposes to 
witness occasional cycles of three games each and, according to the outcome 
of the cycle, to take one of two actions: (aO to let the gambling proceed, 
or (a2) to remove the owner of the dice from the premises. 

Without entering into details of what the observable random variables 
may be and what rule of inductive behavior might be adopted by Chevalier 
de M4r6, we notice that action ai is preferable to a 2 when the probability 
of winning an even money bet is about one-half. Action az is the preferred 
action when this probability is altered so as to favor the owner of the dice. 
The two kinds of error which de M6r6 can commit are: he may let the 
game proceed when the dice used are unfair and he may eject one of his 
guests for alleged cheating although his play is absolutely fair. Which of 
these two errors is more important to avoid is a subjective matter, but it 
is obvious that qualitatively the two errors are very different and their 
possible consequences are not the same. 

Example 2. The process of manufacturing certain drugs is fairly com¬ 
plex. Seemingly unimportant departures from the standard procedure in¬ 
troduce extraneous substances which are highly toxic. The toxicity of these 
impurities is occasionally so high that minute quantities which escape 
ordinary chemical analysis can be dangerous to persons treated with the 
drug. As a result, prior to putting a freshly manufactured lot on the 
market, the lot is tested for toxicity by biological methods. Small doses of 
the drug are injected into a number of exp^mental animals, such as mice, 
and the effect of the injection recorded. If the drug is toxic, then all or 
most of the animals die. Otherwise, the rate of survival is large. 

It is usual to postulate that the number X of deaths among the n 
animals injected with a specified dose of the drug is a random variable 
whose frequency function depends-on the toxicity of the drug. Ordinarily, 
several groups of animals are injected, each group with a different dose of 
the same drug. The experiment results in particular values of the random 
variables Xi, Xa, • • • , X, where each X,- represents the number of deaths 
at a particular dosage. 

The test of the drug may lead to one of two possible courses of action: 



TWO KINDS OF ERROR 


263 


[5-1-3] 

(tti) put the lot of drug on the market, and (ua) return the lot to the manu¬ 
facturer for purification or, possibly, to be destroyed. The choice between 
these two courses of action is determined by the observed values of the 
random variables Xi , X2 , • • • , X, . The two kinds of error connected 
with actions Ui and 02 are very different, and the importance of avoiding 
them strikingly unequal. First consider the case where action at is taken 
when the appropriate action is Ua. This means that the drug is dangerously 
toxic but declared harmless through the unavoidable inaccuracies of the 
experiment, perhaps because of the unusual resistance of the experimental 
animals. Error of this kind may cause death to the patients treated with 
the drug. Actual cases of this kind are on record. 

Now consider the case where action a2 is taken when the appropriate 
action is at . This means that through inaccuracies in the experiment, a 
lot of nontoxic drug is declared toxic. The consequences of this kind of 
error are unpleasant. There may be financial loss to the manufacturer and 
an increase in the price of the drug. However, the occasional rejection of 
a perfectly safe drug is clearly much less undesirable than even an in¬ 
frequent death of a patient. Consequently, the error in taking action Oi 
when the correct action is Ug is more important to avoid than the error in 
taking action a2 when the preferred action is at . 

As already mentioned, the situation where the consequences of the two 
kinds of errors are of unequal importance is of a very general occurrence. 
It is true that in many cases the relative importance of the errors is a 
subjective matter, and one person may consider it predominantly im¬ 
portant to avoid the errors in taking action at while some other person 
may consider it more important to avoid the errors in taking action a2 . 
However, this subjective element lies outside of the theory of statistics. 
The essential point to notice is that, in most cases, the person applying 
a test of a statistical hypothesis considers one of the possible errors more 
important to avoid than the other. j 

Postulating this to be the ordinary case we will use the expression^rror 
of the first kind to describe that particular error in testing hypotheses 
which is considered more important to avoid. The less important error 
will be called the error of the second hindJbi the rare cases where the two 
kinds of error are of exactly the same importance, it is immaterial which 
o£ them is called error of the first kind and which the error of the second 
kind. 

In the problem of testing a drug, the error of the first kind consists in 
marketing a drug which is dangerously toxic. The rejection of a nontoxic 
lot of the drug is the error of the second kind. 

This convention of labeling the two kinds of error is supplemented by 
a parallel convention concerning the_use of the term hypothesis tested. 
Let H he a statistical hypothesis and H its negation. The term hypothesis 
tested is attached to H or to H in such a way that the rejection of the hypothesis 



264 TESTS OF HYPOTHESES [5'1'4] 

tested when ith true is an error of the first kind. It is usual to adjust the 
labels H and H so that the hypothesis tested is labeled by 

In the problem of testing a drug the hypotheses H and H are: ‘The 
drug is toxic^^ and “The drug is nontoxic/' respectively. In accordance 
with the above conventions the hypothesis tested is “The drug is toxic/^ 
the error of the first kind is marketing a toxic drug, and the error of the 
second kind is rejecting a safe drug. __ 

In Example 1 the two alternative hypotheses H and H may be stated 
as follows: “The dice are fair^' and “The dice are loaded.^' For the owner 
of the dice the hypothesis tested will be “The dice are fair.^' On the other 
hand, it is conceivable that Chevalier de M4r4 could adopt the opposite 
attitude. 

‘ With the two conventions and the terminology adopted, we can make 
the following general statements: (i) The error of the first kind consists of 
cm unjustified rejection of the hypothecs tested (i.e., of the rejection of the 
hypothesis tested when it is true); (ii) The error of the first kind is at least 
as important to avoid as the error of the second kind. 

In this form, the two statements may seem dogmatic. However, the 
reader will perceive that the dogmatic character of the statements is 
only apparent, and that in reality the two statements are consequences of 
the adopted conventions regarding the terminology. 

PROBLEMS AND EXERCISES 

1. Referring to Exercise 1 of subsection 5 • 1 • 1, suggest a hypothesis that 
the person testing the poison spray might wish to test. Write down what 
the error of the first and second kinds are as you have phrased the “hy¬ 
pothesis tested.” 

2. Referring to Exercise 2 of subsection 5 -1 • 1, suggest a rule to test the 
hypothesis that there is only one ace in the deck. What are the two types 
of errors under this hypothesis? Would the gambler regard the “hypothesis 
tested” as satisfactory from his viewpoint? 

3. What would be the most important type of error for the company 
in dealing with the acid (Problem 3 of subsection 6-1-1)? Set up a hy¬ 
pothesis tested so that this error is of the first kind. What is the corre¬ 
sponding error of the second kind? 

5*1*4. Critical region. Level of significance. Power function of a test. 

Let X denote a set of observable random variables with their sample space 
W composed of a number of possible sample points e. Let D denote the set 
of admissible simple hypotheses concerning the frequency function px{e) 
and let H be the hypothesis tested. Finally, let be a rule of inductive 
behavior constituting a te'st of the hypothesis H against the set of ad¬ 
missible hypotheses D. 

Accerding to Definition 5*4 of subsection 5*1*2, the rule R prescribes 



CRITICAL REGION 


265 


[5-1-4] 

the rejection of H when the observed sample point occupies certain posi¬ 
tions in the sample space W and the acceptance of H in all other cases. 
In other words, every test R of the hypothesis that is, every method 
R of testing the hypothesis //, reduces to the following steps: (i) the sample 
space W is divided into two parts, say and the remainder W -- Wr ) 
(ii) the hypothesis H is rejected whenever the observed sample point falls 
within Wr and accepted otherwise. The part Wr of the sample space used 
in this manner is called the critical region corresponding to the test R of 
the hypothesis H, 

Definition 5*6. /* Given a test R of a statistical hypothesis H, the term 
critical region^ ^ is used to describe the set of possible positions of the sample 
point whichj according to rule iZ, lead to the rejection of H. ^ 

It is obvious that to define a test of a| statistical hypothesis means to 
define the corresponding critical region.lThe problem of testing statistical 
hypotheses is the problem of selecting critical regions. ]lVHeh atteniptmg^to 
solve this problem, one must remember that the purpose of testing hy¬ 
potheses is to ^void errors insofar as possible. Because an error of the 
first kind is more important to avoid than an error of the second kind, 
our first requirement is that the test should reject the hypothesis tested 
when it is true very infrequently. If this foremost condition can be satisfied 
in more than one way, then the final choice of the test is made so that 
the chances of accepting the hypothesis tested when it is false are mini¬ 
mized. To put it differently, when selecting tests, we begin by making an 
effort to control the frequency of the errors of the first kind (the more 
important errors to avoid), and then think of errors of the second kind. 
The ordinary procedure is to fix arbitrarily a small number ot called the 
level of significance, and to require that the probability of committing an 
error of the first kind does not exceed a. If a critical region satisfies this\ 
condition, then we say that it corrcspoyids to the level of significance a, | 

What the value of the level of significance should be is not a statistical 
problem. The levels of significance ordinarily used are .10; .05, .01, .001, 
etc. The more important the consequences of the first kind of error, the 
smaller the level of significance should be. Once the level of significance 
OL is chosen, two mathematical problems arise. The first of these problems 
consists of determining all possible critical regions corresponding to this 
level of significance. The second problem occurs after the solution of the 
first. It consists of selecting out of all possible critical regions corresponding 
to ^e chosen level of significance the one which most satisfaetbriry con¬ 
trols tKelerrors of the second kind^. 

In connection with the foregoing discussion, it may be asked why one 
should ever adopt a level of significance as large as a = .10 or a == .05. 
Whatever the practical problem may be, if it is possible to arrange that 
the probability of first kind of error is less than .001, why should one choose 






266 TESTS OE HYPOTHESES [5-l*4] 

a procedure with which ^ error of the first kind will appear five or, even, 
ten times in a hundred? The answer is that a decrease in the level of sig¬ 
nificance is paid for by an increase in the probability of errors of the 
second kind./Presently we shall discuss this point in some detail. Now, 
however, it will suffice to point out that errors of the first kind can be 
eliminated altogether by a very easy device but at a heavy price. To 
achieve this we need only make a rule to accept the hypothesis tested 
always, irrespective of the observational results. It will be noticed, how¬ 
ever, that whenever the hypothesis tested is false, this rule invariably 
leads to an error of the second kind. This can hardly be considered satis¬ 
factory. It follows that to choose an appropriate level of significance one 
should carefully balance the Tmpoi^^ consequences of the two 

kinds of errors. 

^”The~'T6feg6ing discussions are concerned with the probabilities of 
rejecting the hypothesis tested either when it is true or wESti Tt is false. 
It Ts iihporfant to learn how to compute these probabilities. For this 
purpose consider the sample space W of the observable random variables 
and, in particular, the part w of W chosen to serve as the critical region. 
The '^critical region is, in reality, a set of several possible sample 
points, say 

(5«1«6) , Ca, ••• , Cm . 

Thus, if the sample point E determined by observation occupies any one 
of the m positions (5-1 *6), then the hypothesis tested H will be rejected. 
Otherwise H will be accepted. It follows that the probability that H will 
be rejected is the probability that E will coincide either with Ci or with 
Ca, or with Cs , etc., or with e„ . In order to deal with probabilities of this 
kind we will write the symbol E t w to denote the statement: ^^E falls 
within or ^^E is an element of w” Also, the symbol E = e,, will be 
used to denote **E coincides with c* 

With this notation, the probability that the critical region w will reject 
the hypothesis H will be written as 

P{E t w} = P[(E = eO + (£? = ea) + • • • + (i? = O}. 

Since the sample point E cannot occupy* two different positions, say 
and Ck at the same time, the addition theorem gives 

P{Ezw} = P{E^e,} + PIE ^ 62 ] + ••• +P{E^em]. 

It follows that, to compute the probability that a test with critical region 
w will reject the hypothesis tested, it is sufficient to enumerate all possible 
sample points e making up the critical region w and to add up the prob¬ 
abilities that the observations will yield any one of these points. 

!l^t h be an admissible simple hypothesis, member of the set 0. Let 
Px(e I h\he the frequency function of the observable random variables^ 




[5‘1*4] POWER FUNCTION 267 

specified by the hypothesis h. In other words, Pxi^k | h) is the probability 
specifiecT by h (or ‘‘given h”) that the sample point will coincide with 
Ck , for any e* in the entire sample space W- Now let p{h | w) stand for the 
probability that a test based on the critical region w will reject the hypoth¬ 
esis tested H when h is the true hypothesis. Obviously 

Pih I w) = pxiei I h) + px(e 2 \h) + ••• + pxifin, | h). 

The probability | ty) is a function of the simple hypothesis hj defined 
over the set Q. If the critical region w corresponds to the level of sig¬ 
nificance a and if N is a simple hypothesis either coinciding with the 
hypothesis tested H (if H is simple) or belonging to H (if H is composite), 
then | ii^) ^ a. If A" is an admissible simple hypothesis contradictory 
to Hy then j8(/i" | w) represents the probability that the test based on it; 
will detect the falsehood of H when the true hypothesis is A". In other 
words, /3(A" | w) is the probability of avoiding an error of the second 
kind when the true hypothesis is /i". The larger j8(/i" | it;), the more satis¬ 
factory the critical region w. If /5(/i" | w) is large, we may say that the 
critical region w has high detecting power. If there are two critical regions 
\ 0 i and W 2 , both corresponding to the same level of significance a, and if 
it happens that /3(/i" | it;i) < p{h" | W 2 ), then we say that, in cases when 
the true hypothesis is h", the critical region W 2 is more powerful than Wi . 
An alternative way of describing the situation is to say that of the two 
critical regions and W 2 considered for testing the hypothesis H against 
an alternative simple hypothesis /i", the region W2 is more powerful than 
the region Wi , In accordance with this manner of speaking, the function 
^{h 1 w) is called the power function of the test of the hypothesis Hy when 
the test is based on the critical region w. More briefly the function p(h | w) 
is described as the power function of the critical region w. 

Definition 5-7. Let 12 be the set of admissible simple hypotheses h and 
let H be the hypothesis tested* Let w be a critical region for testing //. Then 
the function ^{h [ w) defined over the set^Q, which represents the probabilityy 
given hy that the critical region w will reject the hypothesis //, is called the 
power function of the critical region wfor testing II when 12 is the set of admis¬ 
sible hypotheses. 

Referring to Chapter 1, the reader will have no difficulty in establishing 
that the power function of a critic al region w determines the performance 
cha mcteristi c of t he rule of inductiv e behayioy which represents the test 
of the hypothesis H based on the region w. In fact, let ft be a rule of in¬ 
ductive behavior representing the test of a statistical hypothesis H and 
let this test be based on the critical region w. Then the rule ft is concerned 
with only two possible actions; either t6 reject H or to accept H. Namely, 
if the sample point falls within w, the rule ft prescribes the rejection of 
H; if the sample point falls outside w, the rule ft prescribes the acceptance 





268 STATISTICAL HYPOTHESES [ 5 * 2 * 1 ] 

of H, In consequence, the performance characteristic of the rule 22 is a 
set of only two functions of the admissible hypothesis h. One of these 
functions is the probability, given A, that the rule 22 will lead to the re¬ 
jection of Hy and it is obvious that this function coincides with the power 
function pQi | w) of the test. The other function is the probability, given 
h, that the rule 22 will lead t6 the acceptance of /2, and this obviously is 
equal to 1 — fi{h \ w). 

When the hypothesis tested H is false, we shall have occasion to speak 
of the power of the region w for testing H against a specific alternative, 
say hi . The power is the ordinate of the power function corresponding to the 
particular alternative hi and represents the probability ^{hi | w) that, given 
hi , the critical region w will reject the hypothesis H, Thus, the power at hi 
is the probability that w will detect the falsehood of 11 when hi is the true 
hypothesis. 

In order to illustrate the concepts introduced above, in the following 
subsection we will consider a number of examples of statistical hypotheses 
and their tests. All these examples refer to actual situations in statistical 
work. However, the reader must be aware that the problems arising in 
statistical research are much more complicated than the examples given 
here. In order to increase their illustrative value, these examples are con¬ 
siderably simplified so as to avoid lengthy computations, etc. 

All the examples considered are special cases of a more general study 
of certain classes of statistical hypotheses which is developed in later 
subsections. 


5*2. Statistical hypotheses. Illustrations 

5 • 2 • 1. Screening for tuberculosis. Consider the situation in a clinic where 
a general medical check-up of an individuaFs health includes the prepara¬ 
tion of several different X-ray photographs of his chest, all made by 
exactly the same method, and the subsequent readings of these photographs 
by the same radiologist for the purpose of detecting possible signs of in¬ 
cipient tuberculosis. We will assume that the readings of the several 
photographs of the same individual are made in such a way that, as far 
as possible, the independence of each subsequent reading from the results 
of all previous readings is assured. This may be achieved by delivering to 
the radiologist, at one time, a number of X-ray photographs relating to 
several patients and arranged in a random order. 

In the following we will use the words single X-ray examination’' to 
describe the combination of two operations: (a) taking an X-ray photo¬ 
graph and (b) reading this photograph by the radiologist. Thus, when we 
say that an individual is subjected to n X-ray examinations we mean 
that ♦n photographs are taken and that each photograph is read by the 
radiolqgist. 



269 


[ 5 * 2 * 1 ] SCREENING FOR TUBERCULOSIS 

We will assume that previous experience indicates the following: 

(1) If a given patient has no trace of tuberculosis, then the probability 
that a single X-ray examination will be erroneous and will classify 
the patient as affected by tuberculosis is pi = .01. 

(2) If a given patient is moderately affected by tuberculosis, then the 
probability that a single X-ray examination will detect the illness is 
V2 = .60.* 

Suppose that for each patient, n = 5 independent X-ray examinations 
are made, and let X stand for the number of examinations which lead to 
the verdict ^‘positive.Suppose further that, as a matter of routine, 
whenever X = 0 the patient is given a clean bill of health and that in all 
other cases he is subjected to a closer scrutiny for tuberculosis. 

The procedure just described is an example of testing a statistical 
hypothesis. 

(i) Observable random variable and its frequency function. Granting the 

existence of constant probabilities and p 2 , and granting the inde¬ 
pendence of the five successive X-ray examinations, X appears as an 
observable random variable capable of assuming the values A; = 0, 1, 2, 
3, 4, 5. Moreover, it is obvious that X is a binomial random variable. 
Beforehand, it is unknown whether the given patient is or is not affected 
by tuberculosis. Therefore the actual frequency function of X is 

unknown and there are two possibilities. Either the patient is not affected 
by TB and then 

(5•2-1) Vx{k) = 'pxi.k I not affected by TB) = C\'p\(X 
or the patient suffers from TB and then 
(5•2-2) Vx{k) = pxik I affected byTB) = 

(ii) Choice between two courses of action, A general health check-up is 
made, among other things, in order to determine whether or not the patient 
should be treated for tuberculosis. There would be no doubt about the 
choice between‘these two possible actions if it were possible to ascertain 
whether or not the patient actually is suffering from tuberculosis. Another 
way of stating the same thing is to say that there would be no doubt about 
whether or not the patient should be treated for tuberculosis if it were 
possible to ascertain which of the two functions (5.2-1) or (5-2.2) repre¬ 
sents the frequency function of the observable random variable X. Thus 
the situation described is that in which the choice between two possible 
actions, say ax = treat the patient for tuberculosis and ttg = consider the 
patient as free from tuberculosis, depends upon the nature of the frequency 

*For the sake of simplicity we ignore the category of individuals heavily affected 
with tuberculosis which was considered in subsection 4*3*2. 



STATISTICAL HYPOTHESES 


270 


[ 5 - 2 - 1 ] 


function px{^) of an observable random variable Jf, and this function is 
unknown. 

(iii) Set of admissible simple hypotheses. Further, the circumstances de¬ 
scribed imply that the frequency function of X may be either (5*2-1) or 
(5*2*2). Thus the set ft of the admissible simple hypotheses is now com¬ 
posed of only two elements, viz: 


(a) The patient has no tuberculosis: p = pi . 

(b) The patient is affected by tuberculosis: p = p 2 • 

(iv) Hypothesis tested. One of these hypotheses is our hypothesis tested 
and will be denotedjb}'^ H. The other hypothesis is the negation of H and 
will be denoted by H. In order to assign these labels in accordance with 
the conventions made, we consider the errors which can bo committed in 
choosing between (a) and (b). If hypothesis (a) happens to be true and 
it is rejected, then the patient will suffer some unjustified anxiety and, 
perhaps, will be put to some unnecessary expense until further studies of 
his health will establish that any alarm about the state of his chest is un¬ 
founded. Also, the unjustified precautions ordered by the clinic may 
somewhat affect its reputation. On the other hand, should the hypothesis 


Critical 

region 


w- Q— • — • • •-•- 

0 1 2 3 4 5 

e, 62 63 64 65 eg 


k 


Figure 29. Sample Space W of the Random Variable X, 


(b) be true and yet the accepted hypothesis be (a), then the patient will 
be in danger of losing the precious opportunity of treating the incipient 
disease in its beginning stages when the cure is not so difficult. Further¬ 
more, the oversight by the clinic’s specialist of the dangerous condition 
would affect the clinic’s reputation even more than an unnecessary alarm. 
From this point of view, it appears that the error consisting of rejecting 
(b) when it is true is far more important to avoid than the error consisting 
of accepting (b) when it is false. It follows that: 

The hypothesis tested H is: the patient is affected by tuberculosis: p = p 2 - 

The alternative hypothesis H is: the patient is not affected by tubercu¬ 
losis: p = Pi. 

(v) Sample space. Since there is just one observable random variable 
X capable of assuming six integer values fc = 0, 1, 2, 3, 4, 5, the sample 
space W consists of six points only. It is convenient to represent these by 
dots on a horizontal straight line as shown in Figure 29. 

(vi) Critical region. The routine of dealing with patients which con¬ 
sists of suspecting tuberculosis in all those cases when at least one X-ray 
examination results in the verdict ‘‘positive” and of not suspecting tuber- 



[5-2*l] SCREENING FOR TUBERCULOSIS 271 

culosis when all X-ray readings are “negative^^ amounts to a test of the 
statistical hypothesis H using the critical region w composed of just one 
possible sample point A; = 0. In fact, if among the w = 5 X-ray readings 
there are no ‘^positive'' readings, i.e., if A: = 0, then the patient is treated 
as nontuberculous and this amounts to the rejection of the hypothesis 
tested H, 

(vii) Probabilities of the two kinds of errors. An error of the first kind 
will be committed when H is true and H is rejected. In other words: an 
error of the first kind will occur when H is true and the sample point X 
occupies the position Oq with abscissa A: = 0. Thus, the probability of the 
first kind error is 

(5*2-3) P[X = 0\H} = (I - p,y = (.4)" = .010,24. 

An error of the second kind will be committed when H is false and it js 
accepted. In other words: an error of the second kind will occur when H 
is true and the sample point X fails to occupy the position Oq . Thus the 
probability of error of the second kind is 

P{X >0\H} =- 1- PIX = 0 I = 1 - (1 - piT 
= 1 - (.99)" = .049,01. 

(viii) Power of the test. The power of th^test is the probability that 
the test will detect the falsehood of H when H is true. Its value is 

(5-2-4) P{X = 0 I } = (1 ~ ViT = (-99)' = .950,99. 

The interpretation of (5*2-3) and (5-2.4) in operational terms is as 
follows: if all the assumptions made are approximately true and if the 
described procedure of multiple X-ray examinations is consistently used 
by the clinic, then the final diagnosis based on five X-ray examinations 
will be ‘‘negative^’ for about one per cent of all the tuberculous patients 
and for about ninety-five per cent of all those patients who have not con¬ 
tracted the disease. 

We may now remark on some oversimplifications made in the example 
just described. One simplification consists of postulating that for every 
moderately affected individual the probability pa has the same value. 
Naturally, the intensity of illness must vary from one individual to another 
and so must p 2 • Another point worth mentioning is the assumption of 
independence of the several successive X-ray examinations of the same 
individual. Ordinarily a radiologist will inspect several X-ray photographs 
of an individuaPs chest simultaneously and will make up his mind on 
the diagnosis on the combined impression of all the pictures. It is likely 
that this procedure is more effective than the one described in the^xample. 



272 STATISTICAL HYPOTHESES [5*2-2] 

5•2*2. Problem of the Lady tasting tea. The present example is taken 
from the well-knoAvn book, The Design of Experiments [4], by R. A. Fisher, 
an outstanding scholar and founder of the theory of experimentation. 
However, the method of treating the problem discussed below differs from 
that of Fisher. 

Lady declares that by tasting a cup of tea made with milk she can 
discriminate whether the milk or the tea infusion was first added to the 
cup.^^ Specifically, the Lady’s claim is ^^not that she could draw the dis¬ 
tinction with invariable certainty, but that, though sometimes mistaken, 
she would be right more often than not.” 

Before the Lady’s claim is granted, she will be subjected to an experi¬ 
ment. She will be required to taste and classify n pairs of cups of tea, each 
pair containing one cup of tea made by each of the two methods under 
consideration. Care will be taken to insure the essential similarity of 
conditions in which the pairs of cups are classified and to eliminate any 
possible differences between the cups which are irrelevant to the problem. 
Also, the cups of each pair will be presented to the Lady in a random order. 
Finally, in order to insure that the classification of each pair of cups of 
tea is independent of the preceding pairs of cups of tea, a reasonable in¬ 
terval of time will be allowed between the successive elements of the 
experiment. All this may perhaps be achieved by arranging that the Lady 
will classify one pair of cups at breakfast on each of n selected days. 

It is agreed that we will grant the Lady’s claim and, perhaps, give her 
some reward, if the number Xn of pairs of cups classified correctly equals 
or exceeds a certain limit t, specified in advance. 

It is easy to see that the procedure described is equivalent to a test of 
a statistical hypothesis. 

(i) The observable random variable. Granting the identity of conditions 
and the complete independence of the n successive classification of pairs 
of cups of tea, Xn appears to be the number of successes in the course of 
n completely independent trials with the same probability p of success in 
each particular trial. In other words, Xn is a binomial variable capable of 
assuming values A; = 0, 1, 2, • • • , n with probabilities, say 

(5-2.5) pxSk I V) = Cy(l - p)"-*. 

The experiment designed to test the Lady’s claim will determine the value 
ofX,. 

The probability p of success in each particular trial is unknown and, 
therefore, the frequency function (6-2-6) is not completely known. 

(ii) Choice between two possible courses of action. As a result of the 
experiment we contemplate one of two possible actions: a, = to grant the 
Lady’s claim; Oa = to refuse her claim. Should it be possible to ascertain 
the actual value of p, then the choice between the actions Oi and a* would 
be clear. For example, if it were known for certain that p = .99, action 



[5’2'2] LADY TASTING TEA 273 

tti would be perferable to a 2 . On the other hand, if p = .5 the preferred 
action is Ug . Since the value of p determines uniquely the frequency 
function (5-2-5) of the observable random variable , it appears that 
the desirability of actions ai and a 2 depends on the unknown nature of 
the frequency function of the observable random variable. Thus the 
situation described is that where it is necessary to test a statistical hy¬ 
pothesis. 

(iii) Set of admissible simple hypotheses. As already mentioned, to 
specify completely the frequency function Pxn{k\ p) it is sufficient to 
si)ecify the value of p. Thus, every hypothesis specifying the value of p 
is a simple hypothesis. To define the set 12 of simple hypotheses which are 
considered admissible, one must be clear about what values of p are possible 
in the present case and here there may be several points of view. 

First, one may consider that every number between zero and unity 
may be the value of p. The interpretation of values of p which are less 
than one-half would be that the Lady does taste the difference between 
the two methods of making tea, but is confused about which is which. 
Upon the adoption of this point of view, the set of admissible simple 
hypotheses is defined to be the set 12i composed of all hypotheses which 
ascribe to p any value between the limits 0 ^ p ^ 1. 

The second way of interpreting the situation is to postulate that p 
cannot be less than one-half. If this point of view is adopted, then the set, 
say fig , of admissible simple hypotheses will be defined to contain only 
such hypotheses which specify the value of p between the limits .5 ^ p ^ 1. 
Which of the above two points of view is the appropriate one is not a 
statistical problem and we will consider them both. 

(iv) Hypothesis tested. Whichever of the two sets fii and fig we choose 
to consider as the set of admissible hypotheses, action ai is to be preferred 
to action Ug if p and action Og is preferred to action ai when p = h 
Thus the two alternative hypotheses under consideration are 

(a) p 5^ i 

(b) p = |. 

Obviously (b) is a simple hypothesis while (a) is composite. One of these 
hypotheses will be labeled '^the hypothesis tested^' and denoted by Hi . 
The other hypothesis will be called ‘‘the alternative hypothesis'^ and 
denoted by Hi . In order to assign these labels in accordance with the 
conventions made, we have to consider the consequences of errors which 
are possible when choosing between (a) and (b), and here there may be 
several different points of view. 

First of all, we may consider the situation in the spirit of a parlor game. 
Here there appears to be a conflict between the points of view of the Lady 
who claims the ability to taste the difference in the methods of snaking 



274 


STATISTICAL HYPOTHESES 


[ 5 - 2 - 2 ] 

tea and of the Jury who may grant or refuse the Lady^s claim. For the 
Lady it is natural to consider that the more important error to avoid is 
to refuse her claim if, in fact, she is able to taste the difference in tea. 
Thus for the Lady the hypothesis tested is (a). On the other hand, for 
the Jury it is natural to consider that the more important error to avoid 
is granting the claim when it is not justified. Thus, for the Jury, the hy¬ 
pothesis tested is (b), asserting that the Lady has no discriminating ability. 

The situation may also be considered as a simplified example of serious 
research work in which an experimenter is anxious to detect or to ^'obtain 
evidence” of an interesting phenomenon. Instead of the Lady, the subject 
of study may be some kind of plant or animal or, perhaps, some particles 
in the field of radiation research. The question under study may be whether 
or not a specified '^treatment” and its absence (= pouring tea first and 
then adding milk or vice versa) make a difference in the frequency of some 
observable phenomenon. 

If we adopt the attitude of an experimenter and consider the problem 
of the Lady tasting tea, then the appraisal of the two kinds of errors 
may be as follows. Each experimenter knows that the assertions based on 
trials are always subject to errors. However, the experimenter would detest 
being wrong frequently and therefore, it is natural for him to make sure 
that the relative frequency of the assertions that some phenomena actually 
exist when, in fact, they do not, be reduced to a reasonably low level. 
For this reason the experimenter would adopt the point of view coinciding 
with that of the Jury, namely, that the more important error to avoid is 
to reject the hypothesis (b) when in fact it is true. On the other hand, it 
is natural for the experimenter to see to it that his experiment, which may 
cost him a great amount of worry, time, and effort, has a reasonable chance 
of detecting the phenomenon, if this phenomenon actually exists. Thus, 
in the last analysis, the attitude of a research worker may be compared 
with that of a very strict and, at the same time, very benevolent Jury 
who intends to give the Lady a fair chance of establishing her claim. 

Adopting this point of view we will consider that: 

I 

The hypothesis tested Hi a^rts that p = i; 

The alternative hypothesis Hi asserts that p 5 *^ i. 

(v) Sample space. Since there is just one observable random variable 
Xn capable of assuming any integer value k from zero to n and no others, 
the sample space W is composed of n + 1 points. It is convenient to repre¬ 
sent these points on the Xn axis and to ascribe to them abscissae A? = 0, 

1 , 2 , • • • , 71 . 

(vi) Critical region. According to the agreement made, the Lady^s 
claim will be granted, i.e., the hypothesis tested Hi will be rejected if the 
observed value of Xn equals or exceeds a certain number t. It follows that 
the critical region, say w(t, w), is composed of all those possible sample 



[5*2-2] LADY TASTING TEA 275 

points for which the abscissa k is equal to or is greater than t. If t is an 
integer number, then the abscissae of possible sample points forming the 
critical region w{ty n) are fc = + 1, • • • , n. 

(Vii) Probabilities of the two kinds of errors. An error of the first kind 
will be committed when the hypothesis tested is true and when the 
sample point falls within the critical region. Thus the probability of an 
error of the first kind is 

+ p{x„ = < + i|p = l} 

+ -..+p{x„ = n|p = |} 

Using formula (5•2*5) and substituting p = 1 — p = ^ we obtain 

We shall set the level of significance a = .05. 

An error of the second kind will be committed when Hi is false and, 
thus, p 2 > the sample point fails to fall within the critical region 
w{t^ n). Therefore the probability of an error of the second kind is given 
by the formula 

p<[x„ < < I p} = E cy(i - p)"-* 

= I - E py(i - p)"-*. 

Contrary to what we have seen in subsection 5*2-l, in the present case 
the alternative hypothesis Hi is composite and is, in fact, the logical 
sum of an infinity of simple hypotheses ascribing to p any of the various 
values specified in the definition of the set of admissible hypotheses. 
Another way of describing the same thing is to say that in the present 
case there are many different situations inconsistent with the hypothesis 
tested Hi . Naturally, the probability of accepting Hi in any one of these 
situations may be, and actually is, different from that in another. In other 
words, in the present situation there are as many different probabilities 
of the error of the second kind as there are simple admissible hypotheses 



276 STATISTICAL HYPOTHESES [ 5 * 2 * 2 ] 

alternative to the hypothesis tested, i.e., an infinity. Thus, instead of 
giving just one number representing the probability of the error of the 
second kind, in the present case it is appropriate to make a graph from 
which the probability of second kind error can be obtained easily for 
any specified value of p A convenient graph is that of the power 
function of the test. 

(viii) Power function of the test. The power function of a test of a 
statistical hypothesis is the function defined over the set of admissible 
simple hypotheses which, for each such hypothesis h, represents the prob¬ 
ability given h that the test will reject the hypothesis Hi . 

Whether we adopt Qi or ftj as the sot of admissible simple hypotheses, 
each such hypothesis is perfectly identified by the value of p. Thus, in 
this case the power function of the test based on the critical region w{t, n) 
is a function of p and may be denoted by /3(p 1 1, n) where the integer 
numbers, < and n, serve to identify the critical region w{t, n). Since Hi is 
rejected when ^ t, we have 

(5.2-6) P{X„ = k\p} = ± cy(l - p)"-». 

k~t k-t 

If we adopt fi, as the set of admissible simple hypotheses, then (5-2-6) 
represents the power function of the test for all values of p from p = 0 to 
p = 1. On the other hand, if Q 2 is the set of admissible simple hypotheses 
then, in order to obtain the power function of the test, we need consider 
( 5 . 2 . 6 ) only for values of p between the limits .5 g p ^ 1. 



Figure 30 represents the power function p{p \ 5, 5) for n — 5. Its equa¬ 
tion is obtained from (5-2-6) by substituting in it n = 5 and < = 5. This 
equation is simply 

/3(P I 6, 6) = p*. 

Thus, the graph corresponds to the situation where the Lady is asked to 



LADY TASTING TEA 


277 


[5-2-3] 

classify n = 5 pairs of cups of tea and it is agreed to grant her claim if all 
five pairs are classified correctly. The values of p are measured on the 
axis of abscissae. The ordinate of each point on the curve is equal to 
P{p\ 5, 5). That is, if the probability that the Lady will correctly classify 
a pair of cups is any number p', then the probability fi{p' | 6, 5) that she 
will establish her claim is the ordinate corresponding to the abscissa p\ 
For example, if in the long run the Lady can correctly classify a pair of 
cups in about 75 per cent of the cases then her chance of establishing her 
claim is about .24. Again, if the long-run relative frequency of the Lady^s 
correct classification is 10 per cent (i.e., if she does taste the difference in 
tea quite frequently but consistently misclassifies the cups), then her 
chance of establishing her claim by the proposed experiment is very small 
indeed. 

The graph in Figure 30 also provides the value of the probability of the 
error of the first kind. This is the ordinate of the point on the curve with 
abscissa p — We have | 5, 5) = .031,25. 

5*2*3. Problem of the Lady tasting tea (second part). In subsections 
5*2*1 and 5*2*2 we did no more than identify in the two situations de¬ 
scribed the various concepts basic to the theory of testing statistical 
hypotheses. In the present subsection we will use the example of the 
Lady tasting tea to illustrate the usefulness of these concepts in treating 
practical problems. 

The purpose of computing the power function of a test is to characterize 
the working of the test, and an examination of the power function should 
be the basis for deciding whether or not a given test alone or the given 
test and the experimental procedure combined are satisfactory for the 
purposes for which they are intended. liet us, then, adopt the point of 
view of a scholarly experimenter and examine the power function of the 
test considered, as represented in Figure 30. In so doing we shall have to 
keep in mind: (i) that the experimenter wishes to have a guarantee that 
Hi will be rejected when true only infrequently, say not more frequently 
than five times in a hundred, and (ii) that when spending his time, energy 
and, probably, money on an experiment, he wishes the experiment to have 
a fair chance of detecting the phenomenon if it actually exists. 

Upon inspecting Figure 30 it will be seen that the proposed experiment 
and test satisfy condition (i). In fact, if Hi is true and the probability 
p — hi then the chance that the experimenter will grant the Lady’s claim 
is only | 5, 5) = .031,25 < .05. Operationally this means that if, in 
the course of humanity’s research work, assertions of new phenomena are 
made consistently on the ground of experiments arranged like the one 
discussed, then, out of all those cases where the phenomenon investigated 
does not exist, the frequency of false discoveries will be less than five per 
cent. 



278 STATISTICAL HYPOTHESES [5*2*3] 

This being settled, let us turn to the question of whether or not the 
experiment and the subsequent test provide a reasonable chance of de¬ 
tecting the phenomenon if it actually exists. And here we see at once that 
the situation is unsatisfactory. 

The situation is most unsatisfactory, indeed, if it is considered that the 
set of admissible simple hypotheses is Qi . For, in this case, it is admitted 
that the Lady’s ability to taste the difference in making tea may express 
itself in the probability p being less than one-half. For example, if the 
Lady tastes the difference in making tea almost always but consistently 
mislabels the two methods so that her p = .01, then her chance of es¬ 
tablishing the claim is for all practical purposes equal to zero. This result 
suggests that the critical region w(5f 5) may be more satisfactory when 
the set of admissible simple hypotheses is Q 2 • However, even if it is taken 
for granted that the probability p cannot be less than one-half. Figure 30 
shows that the chances of detecting the phenomenon p > \ when it is 
true are still unsatisfactory. 

It is obvious that if p has the value, say p = .500,01, then, strictly 
speaking, the Lady does have the tasting ability she contends but, opera¬ 
tionally, this tasting ability is nil. Therefore the experimenter, in one 
interpretation of the situation, and the Lady, in the other, would probably 
not regret the fact that the test will lead to the rejection of Hi but rarely 
if the actual value of p is .500,01. In this case, the intensity of the phe¬ 
nomenon is too weak to bother about. On the other hand, both the Lady 
and the experimenter may be interested in ‘^detecting” the phenomenon 
if its intensity is substantial and expresses itself in a probability of the 
Lady’s classifying the cups correctly considerably larger than one-half. 
What values of p exceed one-half ‘‘considerably” and what values exceed 
one-half only slightly is a subjective matter and depends on the general 
circumstances of the problem. However, in situations of the kind de¬ 
scribed, each experimenter will have in his mind a somewhat vaguely 
defined limit below which the intensity of the phenomenon studied is 
“weak” or “negligible” and above which this intensity is “substantial” or 
“strong.” For example, in the experiment with the Lady tasting tea such 
a limit may be p = .6 or p = .7, etc. If the experimenter considers that 
p ^ .6 means a substantial ability of discrimination, then it is natural 
for him to arrange his experiment so that, should the value of p be at 
least .6, the chance of detecting the phenomenon p > .5 is reasonably large. 

From this point of view the graph in Figure 30 shows that the situation 
is very unsatisfactory. If p = .6, then the chance of the Lady establishing 
her claim is only about eight in a hundred. If p = .7, then this chance is 
about seventeen in a hundred. Finally, if p = .9, the chance of detecting 
the phenomenon is .59. In other words, if the experimenter insists on 
planning his experiments and in treating them statistically as described, 
and if l^e considers that the phenomenon studied is substantially intense 



[5’2’3] LADY TASTING TEA 279 

when p = .6, then he must be prepared to find that such substantially 
intense phenomena will slip by undetected by his experiments with the 
relative frequency of ninety-two per cent. It is obvious that in these cir¬ 
cumstances the experiments planned are worthless and it is appropriate 
to consider modifications. 

The first idea which is likely to occur in this respect is that perhaps the 
critical region w(5, 5) used in the test was unluckily chosen. To investigate 
possibilities in this direction we compute all the non-zero values of the 
frequency function px.(fc | i). These values are given in Table 6-1. 


Table 5-1 

Frequency Function px,(k \ i) 


k 

0 1 

2 

3 

4 5 

Px.{k 1 i) 

.031,25 .156,25 

.312,50 

.312,50 

.156,25 .031,25 


Upon inspection of Table 5-1, it is seen that, if the experimenter insists 
on the level of significance a = .0.5—that is, if he insists that the frequency 
of unjustified rejections of the hypothesis tested when it is true should 
not exceed the limit .05—then there are only two critical regions possible. 
One of these is the one considered above, namely w{5, 5) consisting of one 
single point k = 5, and the other, say v/, is composed of one single 
point A: = 0. In fact, whatever two possible sample points we take, the 
probability that the observed sample point will fall onto either one of 
them will exceed the level of significance a = .05. It follows that with 
n — b and a = .05 it is impossible to have a test of the hypothesis Hi 
which would be more powerful than the test described. 

Having obtained this result we come to the necessity of modifying the 
experiment. The necessary modification is obvious: we must ask the Lady 
to classify more than n — 5 pairs of cups of tea. At the same time it may 
be possible to relax the conditions somewhat in the sense that it may be 
possible to grant the Lady’s claim not only when she correctly classifies 
all the n pairs of cups but also if she makes one or two mistakes. 

Guided by these ideas we consider the situation with n — 10, i.e., when 
the Lady is asked to classify n = 10 pairs of cups. What should our critical 
region be? In accordance with the general description of the test it must 
consist of several points with the greatest abscissae. Therefore, it should 
certainly contain the point k = 10. However, it may also contain the 
point k = 9—^and perhaps also the point k = 8, etc., provided that the 
probability of an error of the first kind does not exceed the prescribed 
level of significance a = .05. To determine how many points th^ critical 





280 STATISTICAL HYPOTHESES [5‘2*3] 

r^on should contain we compute the frequency function of Xio as de¬ 
termined by the hypothesis tested, beginning with 

Px..(10 I i) = P{X,o = 10 I H,} = (i)*” = .000,977. 

This value is well below a. Therefore we continue and compute 

Px..(9 I i) = = 9 I P.} = 10(i)“* = .009,766. 

The total of the two probabilities is .010,743. Therefore, should the two 
points A: = 10 and fc = 9 be combined to form a critical region, then, if 
this region is consistently used for testing Hi , the probability of an error 
of the first kind would be .010,743, again well below the level of significance 
a. This encourages us to compute the probability P{Xxo = 8 | ffi} in 



order to see whether or not the point k = 8 could also be included in the 
critical region. We have 

px..(8 11) = P{Xio = 8 I /f,} = ^ = .043,945, 

which gives a disconcerting result, namely 

p{X,o ^ 8 1 H} = .000,977 + .009,766 -|- .043,945 

= .054,688. 

It is seen that, should we adopt the critical region, say w(9, 10) defined 
by,thttinequality 9, then the probability of an error of the first kind 
be .010,743, considerably smaller than the experimenter is prepared 
to risk; however, if we use the critical region w{8, 10), which differs from 
tp(9, 10) by just one additional point fc = 8, then the probability of an 
error of the first kind jumps to .054,688, which is more than the chosen 
level ofc significance a = .05. If the critical region w(9, 10) is used, then in 



[5*2-3] LADY TASTING TEA 281 

order to establish her claim, the Lady must correctly classify at least nine 
pairs of cups of tea, i.e., she is allowed to make not more than one mistake 
in classifying ten pairs of cups of tea. If the critical region w{8f 10) is used 
then the Lady may make two mistakes wthout losing her claim. 

Since there is no law enforcing the level of significance a = .05, the 
experimenter may consider relaxing the rule somewhat and increasing the 
value of a to, say .055. If this is done then the critical region it;(8, 10) 
would qualify. However, before deciding, the experimenter will probably 
want to know what the consequences of the proposed relaxation would 
be. For this purpose he would require the power functions of the two 
tests based on the critical regions iy(8, 10) and u;(9, 10) (= the power 
functions of the two critical regions—^for short). These power functions and 
also the power function of the critical region w{5y 5) are given in Figure 31. 
They are obtained from the equations 

Kp 18,10) = CV(1 - vT + C\op\l -v)+ p^\ 

Pip I 9 , 10) = clp\l -p)+ p^\ 

It is seen that the critical region t/;(9, 10) has little if any advantage 
over the critical region t/^(5, 6). In fact, if p = .6, then the Lady has a 
better chance of establishing her claim by classifying correctly all the 
n = 5 pairs of cups than by being allowed to be mistaken once in n = 10 
classifications. Therefore, if an experiment with n = 10 pairs of cups to 
classify is considered at all, then the critical region ty(8, 10) should be 
used rather than i«;(9, 10). The region t(;(8, 10) appears to be much more 
powerful than tt;(5, 5). In fact, if p = .9, then the probability that the 
critical region w{8y 10) will ‘^establish'^ the fact that p > .5 is greater 
than nine in ten, etc. Therefore, the experimenter may see a very good 
reason for abandoning the rigid requirement that the probability of an 
error of the first kind does not exceed .05 and for using w{8y 10) with the 
probability of an error of the first kind equal to .054,688. 

However, if the experimenter’s opinion is firm on the importance of 
detecting the Lady’s ability to taste the difference in tea when p = .6, 
then even the experiment with n = 10 pairs of cups of tea appears to be 
insufficient. In fact, if p = .6, then jS(.6 \ 8, 10) = .1673. An average of 
17 successful tests out of 100 tests made is certainly not a very good 
prospect. Thus, the experimenter will have to continue investigating power 
functions of tests relating to larger numbers of pairs of cups. Each time, 
after selecting a value of n, say n = 20, 40, 60, 80, the experimenter will 
have to determine the value of t such that P[Xn ^ t\ Hi} is approxi¬ 
mately equal to a = .05 and then compute the corresponding power 
function. Figure 32 gives graphs of the power functions of u>(26, 40), 
w(37y 60) and it?(48, 80). It is seen that with n = 80 pairs of cups to classify, 
and with the rule of granting the Lady’s claim when she classifies correctly 



282 


STATISTICAL HYPOTHESES 


[ 5 - 2 - 4 ] 

at least t — 48 pairs of cups, the probability of an error of the first kind is 
very nearly .05 and, at the same time, the chance of the Lady establishing 
her claim when p = .6 is already above .5. With values of p => .7 or 
p = .8 this chance is quite promising, and it is not improbable that the 
experimenter will consider the prospects of success in his experiment as 
satisfactory. If such an experiment is performed and the Lady fails to 
establish her claim, then the experimenter will probably feel safe in assert¬ 
ing that, even though the Lady may have some discriminating ability, 
her p is smaller than .7. 



Figure 32. Power Function of ie?{26,40), U7(37,60) and ui(48,80). 

Figure 32 exhibits the interesting fact that the increase of the size of 
an experiment is subject to the law of diminishing returns: the improve¬ 
ment in the chances of detecting the Lady’s discriminating ability brought 
about by the increase in the experiment from 60 pairs of cups to 80 is less 
pronounced than that corresponding to the increase from 40 to 60. This 
is an example of a general rule that when conditions of an experiment are 
already reasonably good any further improvement is difficult to attain. 

The large size of the experiment needed to insure a reasonable chance 
of the Lady establishing her claim may seem surprising. This is an illus¬ 
tration of the danger^ involved in deciding on the size of an experiment 
solely on intuitive grounds. This is such a ^neral rule that one may fear 
that many costly experiments are conducted with practically no prospect 
of detecting the phenomena studied when they actually exist and are 
reasonably intense. 

5-2-4. Problem of die Lady tasting tea (third part). One of the experi¬ 
mental designs described by Fisher in connection with the problem of 
the Lady tasting tea lends itself to an analysis similar to that given above. 
We shall conrader it in the present subsection. 

Assume that, instead of offering the Lady separately n pairs of cups to 




[5*2*4] LADY TASTING TEA 283 

classify, we offer her, again separately and with due precautions to insure 
independence and similarity of conditions, n single cups of tea. For each 
particular cup the method of making tea would be determined by some 
random process, such as the tossing of a coin, insuring that the probability 
of the tea being prepared by a certain one of the two methods is equal 
to one-half. As previously, the Lady^s claim will be granted if the total 
number of cups classified correctly is t or larger. 

Between the present design of the experiment and that described pre¬ 
viously there is the important difference that now the Lady is not able 
t o judge the cups by comparison. Therefore, the ability of the Lady test^ 
in the present experiment is somewhat different from that tested in the 
experiment described in subsections 5-2-2 and 5-2-3. It may be said that, 
rather than study the Lady’s contention that she is able to discriminat e 
between the two methods of making tea, now we try to establish whether 
or not she is able to identify each (or, at least, either one) of the two 
method^ It is obvious that these two abilities are related but not exactly 
the same. There are some further differences between the two situations. 

We notice that the probability of identifying tea made by one method, 
say Pi , may conceivably be different from the probability, say p 2 , of 
identifying tea made by the other method. Finally, the Lady’s lack of the 
ability to identify the methods of making tea need not imply that pi = 
P 2 = In fact, the assumption that the Lady is not able to identify 
either of the two methods simply means that the frequency of her assertions 
that the tea was made by the first method is the same whether this partic¬ 
ular method or the alternative method actually was usei In terms of the 
probabilities pi and pa ,,this means only that Pi = 1 — P 2 , where pi and 
P 2 may be very different from one-half. For example, the probability pi 
of identifying the first of the two methods may be as large as .9, and this 
need not signify that the Lady has any sort of ability to taste by which 
particular method the tea was made. For this to occur it is necessary and 
sufficient that, when systematically given cups of tea made by the second 
method, the Lady persists in asserting nine times out of ten that the 
method used in making tea was the first method. In terms of the prob¬ 
ability P 2 this means 1 — P 2 == .9 = pi . Conversely, if the Lady does have 
the ability of tasting the method by which the tea was made, this cir¬ 
cumstance will express itself in a different frequency of assertions “the 
tea was made by the first method” computed separately for cups of tea 
made by one method and separately by the other. Thus, the assump tion 
of no sensory perceptio n which allows the Lady to identify the method^ 
makingTea is e gi iivalentt oTtii^^u^^ = 1 P 2 • 

. These remarks suggest that in the p^enF case, "the^set of admissible 
simple hypotheses will have to be of a more complicated nature than the 
sets and Q 2 considered in subsections 5-2-2 and 5-2-3. We shall see, 
however, that there is no real difference in the two situations . The reason 




284 


STATISTICAL HYPOTHESES 


[ 5 - 2 - 4 ] 

is that, although in the present case there are two probabilities to consider, 
Pi and P 2 , there is just one random variable Yn involved and the fre¬ 
quency function of this variable depends not on pi and pa taken separately, 
but on the average |(Pi + P 2 )- 

In studying the nature of the random variable Yn we first notice that 
it is defined as the number of successes in n independent attempts at a 
correct identification of the method by which the tea in a given cup was 
made. Further, the probability of success in each of these attempts is the 
same, say tt, because, according to the postulates adopted, the circum¬ 
stances in which each attempt is made do not vary. The value of w is 
easily computed by considering that the Lady^s success may come about 
in the following two mutually exclusive ways: (i) the tea in a given cup 
may be made by the first of the two methods (with probability J) and 
she may identify this method (with probability pO; and (ii) the tea may be 
made by the second method (with probability i) and she may identify this 
method (with probability pa). The application of the addition and the 
multiplication theorems then gives 

^ = 2P1 + hP 2 = HPi + P2). 

This probability of success applies to every cup offered to the Lady. 
Thus Yn is necessarily a binomial variable with frequency function 

(5-2.7) PYn(k\ir) = Cy{l-irr^ 

depending on the value of tt. We have just seen that the hypothesis: ‘The 
Lady has no ability to identify the method of making tea^^ is equivalent 
to the hypothesis pi = 1 — pa . In terms of the probability tt, this hy¬ 
pothesis is IT = and the alternative hypothesis tt It follows that 
the set of admissible simple hypotheses is composed of all such hypotheses 
h which assert that Yn is a binomial variable with frequency function 
(5-2*7) and such values of ir as may be considered appropriate on non- 
statistical considerations. As in subsection 5-2-2, two different points of 
view may be adopted. According to one, the ability to taste each method 
of making tea may be combined with mislabeling these methods. This pre¬ 
sumption implies the possibility that pi and pa may be any numbers be¬ 
tween zero and unity and it will be seen that, from this first point of view, 
the set of admissible simple hypotheses concerning the variable Yn is 
already considered in subsection 6-2-2. In fact any such hypothesis im¬ 
plies a value of ir arbitrarily selected between the limits 0 ^ g 1. | 

According to the second possible point of view, if the Lady does have 
some ability to identify the method of making tea, then her assertion “the 
tea is made by the first method^' will occur more frequently in cases when 
the method used was actually the first than in cases when it was not. 
Thus Pi > 1 — P 2 and *- > J. In this case the set of admissible simple 
hypotheses concerning Yn is composed of hypotheses asserting the fre- 



[5*2*5] LADY TASTING TEA 285 

quency function (5-2-7) with a value of tt between the limits i ^ tt ^ 1. 
In other words, this is the set 122 considered in subsection 6 • 2 • 2. 

Whichever of the above two points of view is adopted, the hypothesis 
tested is that = |; i.e., this is the hypothesis Hi of subsections 5-2-2 
and 5 • 2 • 3. 

It follows from the above remarks that the statistical side of the problem 
in the present situation and in that described in subsection 5-2-2 is 
exactly the same so that we need not repeat the discussion of the possible 
modification of critical regions, of the power of the test associated with 
varying values of n, etc. However, it is important to notice that the ex- 
peri mental side of the situation te jyerv different in the two cases. In 
particular, at least at first sight, the experimental procedure described in 
this subsection may appear twice as efficient as that described in subsection 
5-2-2. It may be argued, in fact, that in order to insure the same power 
function, the design of subsection 5-2-2 will require the use of twice as 
many cups of tea as the present design. A closer examination of the 
problem reveals that the advantage claimed is problematical. The reason 
is that a given level of sensory perception may be expected to express 
itself in a probability p of a correct discrimination between the two methods 
of making tea considerably higher than tt, which is the arithmetic mean 
of the probabilities pi and pa of correct identification of these methods. 
For example it may happen that, for the same person, the value of p = 
.75 combines with the values of Pi = P 2 = .6. Should this actually be the 
case then, in order to achieve the same probability of detecting that the 
Lady has some ability to taste the difference in the making of tea, the 
number of cups used singly would have to be considerably higher than 
the number of pairs of cups if these cups are to be classified in pairs and 
judged by comparison. Thus the advantage of the experimental design 
described in the present section may be more apparent than real. This cir¬ 
cumstance, however, is subject to experimental verification. 

5-2-5, Problem of the Lady tasting tea (concluded). In this subsection 
we will consider still another design of the experiment to test the Lady^s 
ability to differentiate the taste of tea. This is actually the design which 
seems to be favored by Fisher. It will be seen, however, that it has certain 
serious disadvantages. Fisher^s description [4] of the experiment follows: 

^^Our experiment consists in mixing eight cups of tea, four in one way 
and four in the other, and presenting them to the subject for judgment in 
a random order. The subject has been told in advance of what the test 
will consist, namely that she will be asked to taste eight cups, that there 
shall be four of each kind, and that they shall be presented to her iii a 
random order, that is, in an order not determined arbitrarily by human 
choice, but by the actual manipulation of the physical apparatus used in 
games of chance, cards, dice, roulettes, etc., or, more expeditiously, from 



286 


STATISTICAL HYPOTHESES 


[ 5 - 2 - 5 ] 


a published collection of random sampling numbers purporting to give the 
actual results of such manipulations. Her task is to divide the eight cups 
into two sets of four, agreeing, if possible, with the treatments received.” 

Generalizing the situation, we will consider that the Lady is given 2n 
cups, n of each kind. Her claim will be granted if the number Z„ of correctly 
classified cups with tea made by one particular method is a t least equal to (. 

Obviously Z„ can assume n + 1 different values 0, 1, 2, • • • , n. Thus 
the sample space W is composed of n + 1 points with abscissae k = 0, 
1,2, • • • ,n, exactly as in subsections 5 • 2 • 3 and 5 • 2 • 4. Also, as previously, 
the critical region is composed of points k = t, t + 1, •••,«. The hy¬ 
pothesis that the Lady has no discriminating ability expresses itself in the 
statistical hypothesis tested, say , that any set of n cups is as likely to 
be ascribed to the first method of making tea as any other set. This implies 
that, on the hypothesis Ha , the variable Zn is a hypergeometric variable 
with frequency function 


Pz.(.k I Ha) 


clcr-’^ ^ 

C n fyn 9 

2n ^2n 


because C* = In the particular case described by Fisher, n = 4. 

The level of significance insisted on is a = .05. Then it appears that, with 
n = 4, the value of t must be 4. Thus, in order to establish her claim, the 
Lady is required to classify all the eight cups correctly, without a single 
error. Table 5-2 gives the numerical values of pzX^ \ Ha). 


Table 5*2 

Frequency Function pzX^\ Ha) 


k 

0 

1 

2 

3 

4 

PzXk 1 Ha) 

1 

16 

36 

16 

1 

70 

70 

70 

70 

70 


It is seen that, if the hypothesis tested Ha is true (the Lady has no dis¬ 
criminating ability), then the probability of the Lady establishing her 
claim is only 1/70. In other words, the probability of an error of the first 
kind is 1/70, well below the level of significance a — .05. Without ex¬ 
ceeding the limit a, the critical region could not be increased to include 
the point A: = 3. Thus, from the point of view of errors of the first kind 
(the point of view of a strict Jury), the experiment is designed satisfactorily. 
Not so from the point of view of the Lady nor from the point of view of 
an experimenter anxious to have a reasonable chance of detecting the 
phenomenon if it actually exists and is substantially intense. The passage 
from Fisher relating to this circiunstance is as follows: 



[5-2*5] LADY TASTING TEA 287 

‘‘A probable objection, which the subject might well make to the ex¬ 
periment so far described, is that only if every cup is classified correctly 
will she be judged successful. A single mistake will reduce her performance 
below the level of significance. 5er claim, however, might be, not that she 
could draw the distinctiori^with invariable certainty, but that, though 
sometimes mistaken, she would be right more often than not; and that 
the experiment should be enlarged sufficiently, or repeated sufficiently 
often, for her to be able to demonstrate the predominance of correct 
classifications in spite of occasional errors.^' 

In the next paragraph Fisher notes that, if the Lady is offered twelve 
cups instead of eight then she could be allowed to make one misclassifica- 
tion without losing her claim and still the experiment will conform with 
the level of significance a = .05. ^'By increasing the size of the experiment, * 
we can render it more sensitive, meaning by this that it will allow of the ( 
detection of a lower degree of sensory discrimination, or, in other words, | 
of a quantitatively smaller departure from the null hypothesis.’^ 

The passages quoted illustrate Fisher’s consciousness of the problem of 
the error of the second kind and of the danger that, with a small-scale ex¬ 
periment, the power of the statistical test may be so small as to make the 
experiment a hopeless undertaking. The best guard against dangers of this 
kind is to determine the size of the experiment after computitig a few 
power functions as was done in subsection 5*2-3. However, in order to be 
able to compute the power function of a test, it is necessary to be clear 
about the set of simple hypotheses which are considered admissible. 

Clearly, in relation to any given class of phenomena, the set Q of ad¬ 
missible simple hypotheses is most intimately connected with the experi¬ 
ment designed to study these phenomena. The simple admissible hy¬ 
potheses are concerned with the observable random variables and, if we 
alter the design of the experiment, then this change must produce some 
change in the nature of the observable random variables and may modify 
their relation to the phenomena. Circumstances of this sort were illustrated 
in subsections 5•2*3 and 5*2-4. 

Among other things, the connection between the experimental design 
and the set Q of admissible hypotheses expresses itself in the fact that 
with certain designs the nature of the set 12 is clear and with some other 
designs it is not clear. In the latter case, of course, it is impossible to 
compute the power function of the statistical test contemplated and, 
consequently, it is impossible to answer with any sort of precision the most 
important questions of whether or not the proposed experiment and the 
subsequent statistical treatment have a satisfactory chance of detecting 
the phenomena studied when these phenomena exist and are substantially 
intense. 

The design of the experiment described in this subsection is a case in 
point. Fisher’s description of the situation makes it perfectly clear that. 



288 * STATISTICAL HYPOTHESES [ 5 ' 2 * 5 ] 

if the hypothesis tested is true, then the observable variable Zn is a hyper¬ 
geometric variable. However, the part of this description relating to the 
case where the hypothesis tested is false and the Lady does have a sub¬ 
stantial but not a complete ability to discriminate the taste of tea, is in¬ 
sufficient to determine the corresponding frequency function of the variable 
Zn . It will be noticed that although Zn is defined as the number of suc¬ 
cesses in classifying the n cups of tea made by a particular one of the two 
methods, nevertheless Zn is not a binomial variable. This circumstance is 
due to the fact that the Lady is supposed to judge the cups by comparison 
and, presumably, is allowed to taste and rctaste particular cups repeatedly. 
This deprives her experiment of all semblance of a sequence of inde¬ 
pendent trials. Whenever she makes up her mind about one single cup, 
this classification influences all the others because the Lady knows that 
among the 2n cups given to her there are exactly n cups of each kind. 
And, if Zn is not a binomial variable, and if, due to the Lady's ability to 
discriminate, it is not the familiar hypergeometric variable with frequency 
function 


VzM = 

'^2n 

then what is it? When pressed for a more precise interpretation of her 
contention that she would classify cups correctly more often than not, the 
Lady is likely to reply that, if given for i ndependent (?lassific a^»^^ « 
s equence of pairs of cups , each pair containing one cup of each kind, then 
the frequency of correct classification will be greater than 50 per cent. 
She may also insist that this frequency will be at least equal to some 
specific limit such as .75, etc. This reply is intelligible and perfectly suffi¬ 
cient to determine the size of the experiment needed for the Lady to have 
any desired level of probability of success if her sensory perception is as 
intense as she contends. This, however, applies directly to the design of 
subsection 5-2-2. On the other hand, faced with the design of the present 
subsection, the Lady would have to exercise her imagination in making 
precise her claim, so to speak, independently for each number 2n of cups 
contemplated. For example, when told that a complete lack of discrimina¬ 
tion implies that the frequency function of Z 4 is represented by the figures 
in Table 5-2 and, in particular, that P{Z 4 = 4} = 1/70, she may contend 
that, due to her ability to discriminate, P{Z 4 = 4} is at least equal to 5/70. 

However, this contention, combined with the general description of the 
situation, does not determine the probability of the Lady's success in an 
enlarged experiment with twelve cups of which she is required to classify 
correctly at least five pairs. Any numerical evaluation of this probability 
would depe nd on a fresh gue ss. Thus, ffig^prese^ 
experiment, theXady's protests, and the experimenter's solicitude that the 



[5*2-6] THEORY AND REALITY 289 

experiment be sufficiently large so that she is able to demonstrate her ability 
would have to remain without a convincing answer. 

All the above discussions lead to the conclusion that, when considering 
an experiment designed to test the existence of some phenomenon, it is 
essential to visualize as clearly as possible how this phenomenon can 
manifest itself in the conditions of the experiment considered. More 
specifically, it is essential to be clear about the coimection between the 
frequency function of the observable random variables to be determined 
by the experiment and some appropriate measure (or measures) of the 
intensity of the phenomenon. If the connection is clear, then it is a simple 
matter to compute the power function of the proposed test and then to 
determine whether or not the size of the experiment is sufficient to insure 
that the phenomenon of substantial intensity will have a satisfactory 
chance of being detected. If the connection between the intensity of the 
phenomenon and the frequency functions of the observable variables is 
vague, and the computation of the power functions relating to experi¬ 
ments of increasing sizes is impossible, then the all-important question of 
how large the experiment should be in order to insure a reasonable prob¬ 
ability of success cannot be answered. 

6-2* 6. Relation between theory and reality. In the preceding subsec¬ 
tions, when discussing the consequences of the consistent application of 
the various tests, we have repeatedly made statements like the following: 
^^If the hypothesis tested be true, then it will be rejected with the relative 
frequency so and so,’^ and ^^If the hypothesis tested be false and the 
Lady^s ability to discriminate is measured by the probability p = .7, 
then, in repeated trials, her claim will be established with a relative fre¬ 
quency thus and such.’' It is essential to be quite clear about the exact 
meaning of such statements and about the conditions under which they 
can be verified by observation. 

One important circumstance to remember is that s tatistical tests apply 
to statistical hyyoiheses and that statistical hypotheses ar e (see subsection 
5* 1*1) a ssumptions regarding the frequency funcitions of observable ran- 
dom variables. As a result, any statement regarding the performance of a 
st^tisticaFtest depends upon the postulate that the observable variables 
are random variables and possess the properties specified in the definition 
of the set 12 of the admissible simple hypotheses. Granting this and barring 
possible mistakes in algebra and arithmetic, the statements regarding the 
performance of the tests are correct. However, this fact does not imply 
that, if one repeats a given experiment many times and applies the same 
test to each set of observations, the frequency of correct outcomes will be 
even approximately as predicted from the power function. T he reason fo r 
th e poss ible discrepancy is that the actual exp erimental procedure nee d 
npreonform with the postulates made. 



290 STATISTICAL HYPOTHESES [5*2*6] 

The second important circumstance to have in mind is th at in planning 
an experiment, we usually have in mind the verification of a hypothesis 
dc which is mt a statistical hypothesis. In order to be able to use the thecxy 
of statistics, an experiment is planned and it is postulated that its outcome 
will determine particular values of some random variables. Regarding these 
random variables, a statistical hypothesis H is then formulated so as to 
be intimately related to the nonstatistical hypothesis 3C. However. H is 
never identic al with gC. Finally, the test of H is considered to be in a 
sense, equivalent to the test of SC,. The problem of the Lady tasting tea 
provides a good illustration of the distinctions made. Whatever the experi¬ 
mental design, the psychologist interested in the sensory perceptions of the 
Lady is concerned with the hypothesis SC: “The Lady does not taste t^ 
difference in making tea.” Also, in all cases, the alternative hypothesis 3C 
w “the Lady does taste the difference.” S ince the formula tion of SC and 
S C does not mention any random variable, it is obvious that they are not 
statistical hypotheses. In the preceding subsections we have discussed the 
possibilities that instead of testing the non-statistical hypothesis SC, the 
following statistical hypotheses might be tested; 

(i) Hi in relation to the random variable X„ , to be determined by one 
kind of experiment; 

(ii) Hi in relation to the random variable F„ , to be determined by an¬ 
other kind of experiment; 

(iii) H 2 in relation to the random variable Z„ , to be determined by a 
third kind of experiment. 

In spite of efforts to achieve a close connection between the primary non¬ 
statistical hypothesis SC and the corresponding statistical hypothesis H, 
this connection is frequently loose and there is the most unfortunate possi¬ 
bility of one being correct while the other is false. I n this ca se the frequency 
of correct conclusions regardi ng the statistical hj^othesis tested imay be 
in perfect agreement with the predictions oT ffie^oweFfuncHon, bu€"npt 
t hi^freaaeBiCy'of coiTect conclusions regarding the primary l^o thesisbc . 

To illustrate this point we shall return to the experiment described in 
subsection 5'2'2. Assume that the hypothesis SC is true and thus that 
the Lady has no ability to discriminate betiveen the two methods of 
making tea. Assume, further, that she has become accustomed to makin g 
tea by one of the two methods and to having it served in moderately thick 
cups. Also she is accustomed to tea made by the other method while 
viating with some friends where tea is served in thin cups. As a result, 
the idea of the taste of tea made by each method may have associated 
itself with the thickness of the china or some other such detail which she 
can feel with her lips. Now, if it happens that when preparing pairs of 
cups of. tea for the experiment one of the two methods is predominantly 
used with thinner cups and the other predominantly with thicker cups. 




THEORY AND REALITY 


291 


[5-2-6] 

the resulting variable will be a binomial with probability of success 
p 9^ h Thus the hypothesis H will be false in spite of the fact that X is 
true. It is not difficult to visualize how X may be false and H true. Because 
of the above circumstances, in the planning of experiments to be treated 
statistically, it is important to make efforts to insure that the observable 
r andom variables reasonably conform with the postulates made in defining 
the set of admissible simple hypotheses and that extraneous factors, such 
as the thickness of the china, do not destroy the correspondence between 
the primary hypothesis X and the corresponding statistical hypothesis H, 
Methods by which these two goals may be approached is a part of the 
theory of experimentation. However, because of the intimate link with the 
theory of testing statistical hypotheses, it is appropriate to give the matter 
a brief discussion. 

Broadly, the methods of bringing about an agreement between the pre¬ 
dictions of statistical theory and observation may be classified under two 
headings. 

(a) Adaptation of the statistical theory to the enforced circumstances 
of observation. 

(b) Adaptation of the experimental technique to the postulates of the 
theory. 

The situations referred to in (a) are those in which the observable random 
variables are largely outside of the control of the experimenter or observer. 

A situation of this kind may be exemplified by experiments with in¬ 
secticides. In order to test the effectiveness of some insecticides a field 
with a crop is divided into a number of plots. The several treatments to be 
compared are then applied to several plots, and the effectiveness of the 
treatment is judged by the number of insects surviving the treatment. 
Among other things, statistical tests applicable to the results of such ex¬ 
periments depend on the nature of the random variable X representing 
the number of insects to be found on a plot without any treatment applied. 
First attempts at statistical treatment of this problem were based on the 
presumption that X is a Poisson variable. However., subsequent observa¬ 
tion contradicted this presumption in a most decisive way. Furthermore, 
the biology of the insects concerned indicated a specific machinery behind 
the spacial distribution of the organisms, which is incompatible with the 
Poisson Lav/. Notably, this machinery consists of the following. Moths fly 
over the field and deposit batches of eggs. If the field is uniform, there is 
no indication against the possibility that the number of batches of eggs 
per unit area will be distributed in accordance with the Poisson Law. 
When the larvae hatch out, they are concentrated at one spot and gradually 
disperse in search of food. As a result, at the moment the counts are made, 
the distribution of larvae in single plots show strong signs of ‘‘contagion.’^ 
If one larva is found in a plot, in most cases there will be many nyjre larvae 




292 


STATISTICAL HYPOTHESES [ 5 * 2 * 6 ] 

in the same plot, because a batch of eggs must have been in the vicinity. 

The experimenter does have some control over the nature of the vari¬ 
ables representing the numbers of larvae in his plots. For example, he is 
at liberty to choose the shape and size of his plots, and this will influence 
the counts of larvae. However, the element of contagion in the distribution 
of the number of larvae is intimately connected with the nature of the 
species and could not be altered at the experimenter's command. There¬ 
fore, if the statistical tests based on the hypothesis that the variables 
follow the Poisson Law are not applicable, the only way out of the difficulty 
is to modify or adapt the theory to the enforced circumstances of experi¬ 
mentation. This process must begin by deducing formulae representing the 
frequency function of the random variable X determined by an experiment 
as just described: a batch of eggs is laid at a spot selected ‘^at random^' 
and then a random number of larvae surviving up to the moment of counts 
spread, also at random, in the vicinity of the place of hatching. Formulae 
characterizing such '^contagious^^ distributions have been deduced [8], [12]. 

In many cases, particularly in laboratory experimentation, the nature 
of the observable random variables is much more under the control of the 
experimenter, and here it is usual to adapt the experimental technique so 
that it agrees with the assumptions of the theory. The devices used depend 
on the particular type of experimentation. One of them, however, has 
very general application, and is important enough for every student of 
statistics to be familiar with it. This device is due to R. A. Fisher, and is 
known under the label of ^'principle of randomization.^^ 

When discussing the possibility that the nonstatistical primary hy ¬ 
p othesis gC may be true while the statistical hypothesis Hx is false , we | 
ccmtemplated the possibility that the subject's assertion, 'Tn this cup the 
tea was poured first and milk was added later," may be caused not by 
the specific taste of tea but by the association between the method o f i 
making tea and some extraneous factor such as the thickness of the china . [ 
Assume then that the two cups repeatedly used in the experiment described ^ 
in subsection 5*2-2 differ by an unsuspected but noticeable factor F (e.g., 
thickness of the china) and that the discriminating ability of the Lady is 
in reality directed towards the imequal intensity of this factor F rather 
than, or perhaps in addition to, the method of making tea. In order to 
make the situation clear let us denote the two cups by the letters Ci and 
Cq . Further, we have to distinguish two situations. 

1. In cup Cl tea is poured first and milk added later. In cup Ca milk is 
poured first and tea added later. 

2. Opposite of situation 1. 

Denote by Pi and Pa the probabilities that the Lady will make a correct 
classification of cups in the two situations, respectively. Obviously, if the 
Lady.does not taste the difference between the two methods of making 
tea and bases her verdict on the subconscious appraisal of the factor P, 



[5-2*6] THEORY AND REALITY 293 

then P 2 = 1 -Pi . Further, if, in addition to the difference in the factor 
P, the Lady’s verdict is influenced by the difference in the method of making 
tea, then P 2 7 ^ 1 — Pi and we will write 

P 2 = 1 Pi + 5 7^ 1 Pi . 

Here s 0 and represents the effect of the postulated ability of the Lady 
to taste the two methods of making tea. Suppose that out of the n pairs 
of cups offered the Lady for tasting, Ui were prepared as described under 
( 1 ) and 712 = n — rii as described under ( 2 ). Finally, let Xi and X 2 stand 
for the numbers of correct classifications in the two series of attempts, 
respectively. Obviously both Xi and X 2 are binomial variables with fre¬ 
quency functions 

PxXfci I Pi) = CtP^l - Pi)^‘“*% 


p.Xh I P 2 ) = C;:P^(1 - P2)^-*‘. 

The total number of successes in all the ni + ^3 = n trials is, then, 
the sum of the variables = rci + ^2 and, as we will prove in the second 
volume of this text, is not a binomial variable at all unless Pi = Pj. Thus 
t he pr esence of the disturbing factor F may not only affect the corr e¬ 
spondence between the hypotheses JC and H i , bu t also may cause the 
observabl e variable Xn to follow a pro bability distribution other tH^~ om 
of those postulated in the definition o f the set of admissible simple hy- 
po^esesj^ 

Now, all these difficulties may be avoided by applying the method of 
randomization in the design of the experiment. Imagine that by previous 
checks and rechecks it was established that a certain machinery of per¬ 
forming an experiment insures: (a) a long-run relative frequency of suc¬ 
cesses approximately equal to one-half, and (b) an independence of the 
result of each trial from the results of all previous trials. This machinery 
may be that of tossing a coin, using a roulette, etc. A machinery of this 
kind will be labeled a ‘^standard random machine.” 

The randomization of the experimental design consists, then, in using a 
standard random machine to decide by which method the tea shall be 
made in cup Ci . We work the standard random machine and if the out¬ 
come is “success,” pour tea first into the cup Ci and then add milk. If 
the standard random machine yields “failure,” then the tea in cup Ci is 
made by the alternative method. Of course, once the method of making 
tea in cup Ci is determined, the tea in cup C 2 is made by the alternative 
method. As a result of randomization, t he assignment of the method of 
making tea to a given cup becomes random with the p robability tor either 
method and cup equal to one-half. Upon repeating the reasoning of sub¬ 
section 5*2-4, it is easily found that, with the randomized design, the 
probability of the Lady’s success in classifying any given pair 6f cups is 





294 SCREENING FOR TUBERCULOSIS [5'2-7] 

the same and is equal to, say t = i(Pi + Pa). Since Pi and Pa satisfy 
the condition that 


P, = 1 - Pi + s > 1 - Pi, 

it follows that 

= K1 + «), 

where s 0 or s = 0 according to whether or not the Lady is able to taste 
the difference in making tea. Thus, the results of randomization are: 
first, whether there exists a disturbing and irrelevant factor F or not, the 
observable random variable, X„ , is a binomial variable; second, whether 
there exists the disturbing factor F or not, if the Lady cannot taste the 
difference in making tea so that 8 = 0, the statistical hypothesis Hi is 
true; if the Lady does taste the difference in making tea so that 8 0, 

the hypothesis Hi is false. 

When reading the quotation from Fisher given in subsection 5-2-5, the 
reader may have been impressed by his insistence that the treatments be 
assigned to the particular cups using some sort of standard random ma¬ 
chine. This is another example of the application of the principle of ran¬ 
domization. Its purpose is to make certain that, if the Lady does not 
taste the difference in making tea, the variable Z, will follow the hyper¬ 
geometric distribution irrespective of whether or not there exists a dis¬ 
turbing factor F of the kind described. 

5*2-7. Screening for tuberculosis. Bivariate case. In all the previous 
illustrations there was just one observable random variable and, there¬ 
fore, the sample space could be represented conveniently by several dots 
on a straight line. Also, intuitive considerations suggested the general 
character of the critical region appropriate to test a given statistical hy¬ 
pothesis. This is frequently the case in practical work. However, in many 
other cases it is necessary to consider simultaneously several observable 
random variables and then, occasionally, the choice of an appropriate 
critical region is not at all obvious. 

To illustrate a situation of this kind we will return to the example of 
subsection 6'2-l and modify it slightly by’assuming that the screening 
procedure for a particular disease D involves repeated application of not 
one but of two Cerent tests to be denoted by Ti and Tt . Specifically, 
we will assume that each patient is subjected to n, = 6 independent appli¬ 
cations of test Ti and to n* = 5 independent applications of test . 
Simplifying matters just as was done in subsection 5•2-1, we will assume 
further that 

for an individual free from disease Z>, 

' the probability that a single application of test Ti 
will give verdict “positive” is p,, = .1; 



[ 5 - 2 - 7 ] 


BIVARIATE CASE 


295 


the probability that a single application of test 
will give verdict ‘‘positive^^ is P 21 == .2; 

and, 

for an individual suffering from D, 

the probability that a single application of test 2\ 
will give verdict ‘^positive” is pi 2 = .4; 
the probability that a single application of test T 2 
will give verdict ^‘positive’’ is P 22 = -9. 

Denote by Xi and X 2 , respectively, the number of positive outcomes 
in the 5 tests Ti and in the 5 tests T 2 performed on a given individual. 

There are, then, two observable random variables, Xi and X 2 , in this 
case. Each of these variables is capable of assuming six different values 
Xi = ki, and X 2 = ^ 2 , where ki ,k 2 = 0, 1, 2, 3, 4, 5. Since each possible 
value of Xi can combine with any possible value of X 2 , the sample space 
W is composed of 36 sample points which can be conveniently denoted 
simply by (ki , k^) and represented on a graph by dots in a rectangular 
lattice as illustrated in Figure 33. As in subsection 5*2-1 , the set of ad¬ 
missible simple hypotheses il is composed of only two elements: 



Figure 33. Sample Space of Observable Random Variables Xi and X 2 . 

The hypothesis tested H: th^ndividvnl suffers from D, and, 

The alternative hypothesis H: the individiial does not suffer from D. 
Whichever of these two hypotheses is true, the assumptions of independ¬ 
ence of repeated tests and of constant probabilities of positive outcomes of 
the tests imply that, say 


( 5 . 2 . 8 ) P\(X, = h)(X2 = fe)} = P[X^ = h}P{X2 = k2} 



296 * SCREENING FOR TUBERCULOSIS [5*2*7] 

where both P{Xi ?= Aji} and P{X 2 = A:*} are binomial probabilities. 
Thus, on the hypothesis tested, we have, say 


(5-2-9) 


Px..x.% P\{X, = k,XX, = h) I H} 

~ C***Pl2(l P12)* ** X Cj'PmCI “ P22)* **• 


Also, on the alternative h 3 rpothesis. 


Px..x.ih P{(Xi = kMX2 = k,) I H] 


= CS'pJKl - X Cl'plld - p2i)‘-*‘. 

The functions Px,.x.(ki , A:* | P) and Pxt.x,(ki , k^l H) thus defined are 
termed the joint frequency functions of the variables Xi and X 2 on the 
two hypotheses H and H, respectively. Tables 5*3 and 5-4 give the values 
of these frequency functions computed for all the 36 possible sample points. 


Table 5-3 

Frequency Function Px^.x.{ki , A** | P) 





Aji = 0 

BB 

fa = 2 

fa = 3 


fci = 6 

kt 

a 

5 

.045,917 

.153,055 

.204,073 

.136,049 

.045,350 

.006,047 

k2 

a 

4 

.025,509 

.085,031 

.113,374 

.075,583 

.025,194 

.003,359 

k. 

a 

3 

.005,669 

.018,896 

.025,194 

.016,796 

.005,599 

.000,746 

kt 

a 

2 

.000,630 

.002,100 

.002,799 

.001,866 

.000,622 

.000,083 

kt 

a 

1 

.000,035 

.000,117 

.000,156 

.000,104 

.000,035 

.000,005 

kt 


0 

.000,001 

.000,003 

.000,003 

.000,002 

.000,001 

.000,000 




fa - 0 

fa « 1 

fa 2 

fa = 3 

fci = 4 

il 

01 


Recalling details of the similar problem discussed in subsection 5•2*1, 
the reader will notice that now, as a result of there being two observable 
random variables instead of one, the number of possible sample point s 
has incr eased from six to thirty six and that the probabilities of the indi¬ 
vidual points have markedly decreased. It is ?asy to visualize that, should 
the study of some problem require simultaneous consideration of not two 
but, say, ten observable random variables, then, unless special methods are 
used to avoid the difficulty, the multiplicity of possible sample points 
would be quite embarrassing. This is one of the difficulties mentioned a t 
the beginning of this subsec tion. T he other difficulty, which also requir es 
special study, is connected with the problem of tbe critical region a p- 
I^piiate to test the iiypothesis tested. 

In subsection 6*2-l it was quite clear that, the fewer the positive out- 
comek of the X-ray examinations, the more likely it is that the individual 









BIVARIATE CASE 


297 


[5-2-7] 

concerned has no tuberculosis. Therefore, whatever the number n of inde¬ 
pendent X-ray examinations, the critical region w appropriate to test the 
hypothesis that the individual suffers from tuberculosis, is defined by 
the inequality k ^ t, where tisa, number to be determined in ^^pordanc e 
with tj^ jshpien level of significance a. In other words, in the conditicms 
of subsection 6-2-1 it was clear that the appropriate criterion on which 
to reject or to accept the hypothesis H was the value of X = total number 
of positive outcomes of n independent X-ray examinations. 

In the present case intuitive considerations are not so explicit on the 
question of what combinations of values of Xi and X 2 are the most indic¬ 
ative of the falsehood of H. The reason is that the two tests Ti and 
have distinctly different properties. Test Ti is reasonably safe when appli^ 






298 SCREENING FOR TUBERCULOSIS [5‘2-7] 

actually determined by observation. In order to interpret this rule, we 
visualize a boundary ti between the ^‘smair^ and ‘large^^ probabilities of 
possible sample points as determined by the hypothesis tested. Then the 
critical region^ say Wi, corresponding to Rule 1 includes all points (/ci , 
for which 

PXt,Xn(^l > ^2 1 -H) < 

none of those for which 

" I 

Pxi.z.(^i fk2\H) > ii» 

If there are points (ki, fca) for which 

Vxx,x»(ki f k2 I H) == < 1 , 

then these can be arbitrarily distributed between the critical region Wi 
and its complement TV” — so as to bring the probability of the first kind 
of error as close as possible to the chosen level of significance a. For ex¬ 
ample, let a = .001. To determine Wi we sort the possible sample points 
in order of increasing probabilities as shown in Table 5*3 and add these 
probabilities in cumulative sums. 

Possible Sample Point (fci , fca) Pxt,x»(ki , k 2 \ H) Cumulative Sums 


(5,0) 

.000,000 

.000,000 

(4.0) 

.000,001 

.000,001 

(0,0) 

.000,001 

.000,002 

(3.0) 

.000,002 

.000,004 

(2,0) 

.000,003 

.000,007 


etc., etc. 


This process continues until the cumulative sum exceeds a = .001. Then 
<1 is equal to the last probability added with which the cumulative sum is 
closest to but less than a. The totality of the corresponding points forms 
the critical region Wi . The student will easily verify that ti = .000,622. 
Accordingly, the critical region Wi includes all sample points (fci , fcg) with 
A?a = 0 or 1 and, in addition, the two points (ki = 4; *2 = 2) and {ki = 5; 
kg = 2). The probability of an error of the fitst kind and the value of the 
power function of the region, when ff is true (which we shall call the power), 
are obtained by adding up the probabilities_of all the points included in 
Wi as determined by H (in Table 5 • 3) and by H (in Table 6-4), respectively. 
We have 

(6-2-10) P{E^w^ i H} = .001,167, 

(6-2-11) I3(w0 « P{Etw, 1 ff} = .737,373. 

In Figure 34a the shaded area represents the critical region Wi . 






299 



Rule 2. Reject H when the criterion Y = 2 X 1 + i^ “too amalV^ 

To interpret Rule 2 we proceed as above and visualize a number <3 
which represents the limit between such values of the criterion Y which 
are “too smalF^ and the others which are not “too small/' Then the critical 




Figure 34. Sample Space W of Random Variables Xi and X 2 and Critical 
Regions Wi ^ W 2 , wz ^ Wi. 


region W 2 corresponding to Rule 2 contains all the points (ki , A 2 ) for 
which, 

2ki + k2 < t2 

and none of those for which 

2 ki “b AJ3 ^ ^2 • 

If there be one or more points for which 2ki + ^2 = ^ 2 ^ then these 
points are distributed between w^ and W — 102 so as to comply, as nearly 
as possible, with the imposed level of significance a = . 001 . To determine 
the critical region W 2 , we arrange the possible sample points {ki , ^ 2 ) iii 



300 


SCREENING FOR TUBERCULOSIS 


[5-2-7] 

the order of increasing value of the criterion Y and form the cumulative 
sums of their probabilities up to the moment when this sum exceeds the 
value of a = .001. This is done most conveniently by making up a little 
table of values of the criterion Y corresponding to each possible sample 
point. 

Table 5-5 


Values of Criterion Y 



0 

II 

CO 

II 

II 

QIQQII 

A/2 5 

5 7 

9 11 

13 15 

A:2 = 4 

4 6 

8 10 

12 14 

A/2 ~ 3 

3 5 

7 9 

11 13 

fcj = 2 

2 4 

6 8 

10 12 

fc* = 1 

1 3 

5 7 

9 11 

^2 = 0 

0 2 

4 6 

8 10 


II 

0 

II 

fei = 2 fci = 3 

ran 

It follows that the criterion 

Y arranges the possible sample points in 

the following order: 




Possible Sample 



Y 

Point (/ci , ^ 2 ) 

Vx..x,% ,ki\H) 

Cumulative Sums 

0 

(0,0) 

.000,001 

.000,001 

1 

(0.1) 

.000,035 

.000,036 

2 

(1.0) 

.000,003 

.000,039 

2 

(0.2) 

.000,630 

.000,669 

3 

(1.1) 

.000,117 

.000,786 

3 

(0.3) 

.005,669 

.006,455 


In this case the level of significance a = .001 is exceeded by the cumulative 
sum when we add the probability of the second of the two points for 
which F = 3. It is clear that the closest approximation to a is reached 
when this last point (0, 3) is omitted. Thus, the critical region W 2 is defined 
to contain the five points (0, 0), (0, 1), (1, 0), (0, 2), and (1, 1). With this 
critical region, the individual subjected to the two tests Ti and T 2 will 
be given a clean bill of health only if these tests result in some one of the 
five combinations of values of Xi and X 2 just enumerated. 

The probability of an error of the first kind and the power of the eritical 
region ly^are, respectively, 









[5-2-7] 


301 


PROBLEMS AND EXERCISES 
P{EtW2 I H} = .000,786, 

PM = PlEtW 2 1 H} = .798,153. 

By comparing these values with (5-2-10) and (5-2-11) it will be seen 
that the critical region W 2 is preferable to Wi . In fact, compared with Wi 
the region W 2 combines u greater power with a smaller probablity of an 
error of the first kind. 

Rule 3. Reject H when the criterion Z = Xi + X 2 is ‘Hoo small” 

Rule 4. Reject H when the criterion f/ = Xi + 2 X 2 is *Hoo small” 
Proceeding in a way quite similar to the one just described, we find that 
the critical regions w^ and 11)4 corresponding to Rules 3 and 4, respectively, 
insure the following probabilities 

P[E tw,\H} = .000,947, PIE tw^lH} = .854,555, 

P{Etw^ I //} = .000,948, P{J 5 ;c ^4 | H] = .854,705. 

Figure 34 illustrates the four critical regions suggested. In each case the 
critical region is represented by the shaded area. It is seen that, of the four 
critical regions, w^ and W 4 , are of almost equal value and are preferable to 
Wi and W 2 . 


PROBLEMS AND EXERCISES 

Consider the rules of inductive behavior described below in Problems 1 to 
3 and answer the following questions: 

(i) What is (are) the observable random variable (variables) and what 
is the corresponding frequency function? 

(ii) What is the set of possible actions contemplated? Does a rule for 
choosing a particular action amount to a test of a statistical hy¬ 
pothesis? Why? 

(iii) What is the set of admissible simple hypotheses? 

(iv) What is the hypothesis tested? Why? 

(v) What is the sample space? 

(vi) What is the critical region? 

(vii) Compute the probability of an error of the first land, and plot the 
power function of the test. 

1 . A consumer contemplates purchasing a consignment of N manu¬ 
factured products. When the price is set, it is taken into account that some 
of the items purchased may be defective. However, the purchaser hopes 
that the number of defective items will not exceed a proportion p of the 
total. As a partial check it is agreed that a sample of n items will be selected 
at random from the consignment and subjected to detailed inspection. If 
the number X of defective items found in the sample does not exceed a 



302 TESTS OF HYPOTHESES [5‘2*7] 

bound t, set in advance, then the whole consignment will be accepted and 
paid for by the consumer. Otherwise the consignment will be rejected. 

In actual practice, agreements of the above kind are ordinarily made 
with regard to large consignments of thousands of items. Also, the size n 
of the sample is usually considerable. As a matter of exercise take N = 10, 
n = 4, < = 1 and p = .3. 

2. For the purposes of some chemical industry, it is desirable that the 
water used contain less than mo bacteria per unit volume. In order to guard 
against excessive contamination by these bacteria, the following routine 
test is performed. A total of n samples, each of volume v, are taken from 
the tanlcful of water and each sample is added to a separate test tube 
containing an appropriate nutrient. The test tubes are kept in the tem¬ 
perature most favorable for bacterial growth. If a given sample is “fertile” 
(i.e., if it contains at least one living bacterium), then a colony of bacteria 
will grow within a few hours and the originally clear nutrient will become 
opaque. 

The tankful of water is judged satisfactory and used for production if 
the number X of fertile samples does not exceed a preassigned number t. 
Otherwise, steps are taken to purify the water. 

(a) Let rito — = 1, n — 10, < = 3. 

(b) Let Too = 1, t) = 2, n = 8, f = 2. 

Hint: Refer to the Problems at the end of Chapter 4. Purification of 
water involves some trouble and expense. However, if the water used for 
production contains an excessive density of bacteria, then the loss incurred 
will be much greater. 

3. Consider the following modification of the design of experiment de¬ 
scribed in subsection 5•2-4. The purpose of the experiment is to test the 
Lady’s claim that by tasting she can identify the method of making tea. 
The experiment consists of letting the Lady taste the tea out of n cups 
offered to her one at a time. The Lady is not informed about the number 
of cups of tea which will be made by either of the two methods. However, 
the experimenter selects arbitrarily two numbers Ui and n* such that 
m + Ui <= n and decides that in Ut cups the tea will be made by the first 
method and in the other cups it will be made by the second method. 
The ni + Ui — n cups are then presented to the Lady in a random order. 
Care is taken to insure the independence of the successive classifications 
of cups. Let Xi and X* denote the number of successes in identifying, 
r^pectively, the first and the second method of making tea. Also, let pi 
and p 2 denote the probabilities of the Lady’s success in identifying the 
first and the second method of making tea, respectively. The Lady claims 
that Pt > i Jh > i- Also it is taken for granted that none of the p’s 
is less than one-half. Consequently, if the Lady has no sensory perception, 
thep, as discussed, p» = Pa = i- One experimenter Ai is prepared to accept 
the existence of this particular kind of sensory perception if the sum 



[5*2-7] PROBLEMS AND EXERCISES 303 

Xi + 2 X 2 exceeds a certain specified number ti . Another experimenter 
A 2 is prepared to make the same decision if the sum Xi + X 2 exceeds a 
fixed limit ^2 • The third experimenter A 3 argues that real ‘‘evidence^' of 
the sensory perception will be present only if either Xi or X 2 exceeds a 
limit ^3 . 

Assume as given that the first method of making tea is more difficult 
to identify than the second and that ^ ^ ^ f. Furthermore, whatever 

be Pi, within the above limits, assume that the corresponding value of 
P 2 is 

P 2 = 2 + 2 (pi — I). 

Answer all the questions asked, assuming Ui = 7^2 = = 11, ^2 = 7 

and <3 = 4. 

4. The experiment described in Problem 3 may give misleading results 
because of some disturbing factor F. For example, when deciding that 
the tea in a given cup was made by the first method, the Lady may sub¬ 
consciously react not to the taste of tea but to the thickness of china in 
which it is served, etc. How could one modify the design of the experiment 
in order to eliminate the disturbing factors F such as the one mentioned? 

5. The critical regions Wi ^ W 2 j f and W 4 , considered in subsection 
6-2*7 correspond (approximately) to the level of significance a = .001. 
Find critical regions determined by the same criteria but corresponding 
(approximately) to the level of significance a = . 01 . Determine the power 
of these critical regions and indicate which of them is preferable to the 
others. 

6 . Consider the problem treated in subsection 5-2-7 but assume differ¬ 
ent values for pu and p^i . Namely let pu = .2 and P 21 = .1. Verify that 
this change in conditions docs not modify the hypothesis tested H but 
affects the alternative H. Compute the power of the four critical regions 
Wi , W 2 , W 3 j and W 4 . , and see whether or not the region tc ?4 continues to 
be preferable to the others irrespective of the change in the alternative 
hypothesis. 

7. Consider Problem 3 and assume that the Lady is convinced that her 
ability to identify the two methods of making tea is expressed by the 
probabilities pi = .7 and P 2 = .9. For each of the three experimenters 
Ai , A 2 , and A 3 , determine the critical value U and the number rii = nj 
of cups of tea of each kind to be tasted by the Lady so as to conform with 
the prescribed level of significance a = .05 and so that, should the Lady’s 
contention pi = .7 and P 2 = .9 be justified, the probability of her es¬ 
tablishing her claim is at least .95. Compute and plot the power function of 
the three critical regions considered. Which of the three experimenters had 
the best idea about the appropriate test of the hypothesis that the Lady 
cannot identify the method of making tea by tasting? 

8 . Consider a plant A and a pair of genes g, 6 . G is dominant and causes 



BEST CRITICAL REGION 



[5-3-1] 


the plant A to bear red flowers. Gene g is recessive and the homozygous 
plants gg have white flowers. A number of seeds of plant A are available. 
The se^s are of uncertain origin and may include all the three genetical 
types GGy gG, and gg. It is desired to use these seeds for breeding and 
selection so as to obtain, as far as possible, a consignment C of pure 
dominants GG which would bloom red. For this purpose, each available 
seed is planted separately from the others and care is taken to eliminate 
the possibility of cross fertilization. The recessive plants bloom white 
and, therefore, are immediately identified and eliminated. The remaining 
plants are either pure dominants GG or hybrids gG. Each plant yields a 
bagful of seeds all obtained from self-fertilization. If the mother plant is 
GG, then all of its seeds will be GG. If the mother plant is a hybrid gO, 
then among its seeds there may be some white blooming recessives gg. 
Therefore, it is desired to eliminate the bags of seeds obtained from the 
heterozygous plants. For this purpose the breeder selects out of each bag 
n seeds and plants them. If at least one of these n seeds yields a white 
blooming plant, then this means that the mother plant was a hybrid and 
the whole bagful of seeds collected from this plant is discarded. All other 
bagfuls of seeds are combined to produce the consignment C. How large 
must n be in order that at least 95 per cent of all bagfuls of seeds coming 
from the hybrid plants are eliminated? 

Assume that the original supply of seeds represents a generation pro¬ 
duced under panmixia with 16 per cent recessive plants. Compute the 
probability that a seed included in consignment C will bloom white. 


5*3. Simple hypothesis H tested against a single simple 
alternative H 

6*3•!. Best critical regions. Since the purpose of applying a test of a 
statistical hypothesis is to decrease the probability of errors in rejecting 
this hypothesis or in accepting it, if two different critical regions Wi and 
W 2 are suggested, both insuring the same probability of error of the first 
kind, then the choice between these regions depends on their effectiveness 
in controlling errors of the second kind. Starting with these ideas we in¬ 
troduce the following definition. 

Definition 5 • 8. We say that a region Wq is a best Critical region (B.C.R., 
for short) ^r testing the simple hypothesis H against a simple admissible 
hypothesis H, with the probability of the first kind of error equal to a', if 

P[Ezw^\H] « of 

and if - 

(6-3-1) . P{EtWo\H] ^ P{Ew\H] 



[5-3’l] DEFINITION AND THEOREM 305 

where w stands for any regim sitch that 

P{Ezw\H] g o'. 

The probabilities in (5*3-l) represent the powers of the critical regions 
Wo and Wj respectively. Thus it can be said that the B.C.R. with the 
probability of the first kind of error equal to a' is the ‘‘most powerful^’ 
critical region out of those which correspond to the level of significance 
a'. We will use the expressions, B.C.R. and most powerful critical region 
interchangeably. 

The above Definition 5-8 describes the region Wq a s B.C.R .. not as 
th e B.C.R . for the reason that, on occasion, there are several different 
regions satisfying the definition of B.C.R. Naturally, if there are several 
B.C.R.'s insuring the same probability of error of the first kind, the power 
of these regions must be the same and^ therefore, any one of them is just 
as satisfactory for testing H against H as any other, with respect to the 
criteria considered. 

T ]ie following theorem provides an easy method for determining a 
B.C.R . It is a simplified version of the so-called Fundamental Lemma in 
the theory of testing statistical hypotheses. 

Let E stand for a sample point determined by some random variable 
or variables. Let e be a possible position of the sample point Ey and let 
II and H be two different simple statistical hypotheses concerning E, We 
shall be concerned with_the case where H is the hypothesis tested. Denote 
hy Pe{^\ H) and | H) the frequency functions of E specified by the 
hypothesis H^and //, respectively. For every possible sample point e for 
which Pe{^ I > 0; define X*(e) to mean 

(5-3-2) 

pAe I H) 

Finally, let i be a positive number and let Wx{t) stand for a region in the 
sample space W which includes all possible sample points e' for which 

X*(e') < t, 

and none of those points c" for which 

X*(e") > t. 

Whether the region vhit) includes any point e'" where X*(e'") = t or 
not, i^immaterial. Notice that does not contain points in which 
p*(e I H) = 0. 

Theobbm on Best Cbitical Regions. Theobem 5"1. WheUever be 
the number t, the region Wx(t) is a B.C.R. for testing the simple hypothesis H 



BEST CRITICAL REGION 


306 


equcA to 


(5-3-3) 


[ 5 - 31 ] 


aUemative H, toith the probability of the first kind of error 


c/ = P[EtVK{t)\H\. 


Remarki Theorem 5-1 asserts that there^xists no region which is more 
powerful than iDx(t) for testing H against H and yet has the same or less 
probability of an error of the first kind. However, for a fixed level of sig¬ 
nificance a, there may exist regions which are more powerful than W\(t). 
This is discussed in subsection 5-3-5. 

Proof. In proving this theorem we shall use the remark made in sub¬ 
section 6-1-4 that, whatever the simple hypothesis g and whatever the 
region u in the sample space, the probability P{E t u \ g], given g, that 
the sample point E will fall in u is equal to the sum ^ psie | g) of the 

tt 

values of the frequency function specified by g, this sum extending over 
all possible sample points e which belong to u. 



Figure 35. Sample Space W and the Regions t«x(0 and w* 

To prove the theorem it is sufficient to sho^that, whatever critical 
region w may be suggested for testing H against H, if w corresponds to the 
level of significance a', in the sense that, 

(5-3-4) P{Etw\H)^a', 

then the power of w can not exceed that of so that 

P{Etw\H] ^ P[E tv^(t)\H}. 

Assume, then, that the region w does satisfy (5*3*4) and consider the 
difference, say, 

D = P{E t w^(t) \H] - P{Etw\ H]. 

The theorem will be proved if we show that Z) ^ 0. The regions tcx({) 
and \D may have some possible sample points in common. Denote by v 
the set of these points. Also let Wx(t) — v denote the set of those possible 
sample points in «^(0 which lie outside of w. Similarly, let to — t> denote 
the set of those possible sample points belonging to w which are not in¬ 
cluded in Wx(0* Should there be no points belonging to both u>x(f) and w, 
then the set v is empty and the sets Wx{t) — v and w — v coincide with 





[5'3*1] DEFINITION AND THEOREM 307 

v\(f), and w, respectively. The situation where v is not empty is illustrated 
in Figure 35. Dots denote the possible sample points e. The area shaded 
^ represents iOx(0> the area shaded % represents w, and the doubly 
shaded area represents «• _ _ 

The probability P\E t w^it) | f?}, given H, that tlm sample point E will 
fall within is equal to the sum of terms p«(e | H) extending over all 
possible sample points included in v\(t)- Thus, 

P{SetCx«) |H} = E PB(elH), 

v>X(i) 

or, since «>).(<) is composed of two nonoverlapping parts, Wx(t) = 
(wx(0 -«) + «, 

P(E e w,(t) IE} = E pE(e \H)+E P^ie \ H). 

Similarly 

P{Etv}\H] = EpB(e\H)+ EpE(.e\H). 

til —» t 


Hence, the difference D can be written as 

D= E PEie\H) - EPE{e\H). 

— w—» 

Now, every possible sample point which belongs to Wx{t) — t; is an interior 
point of W\(t), Therefore, in accordance with the definition of Wx(t)j what¬ 
ever possible sample point e' interior to Wx{t) — v we take, the following 
relation must be satisfied 

or 

It follows that 

(5-3-5) E PBie\H)- E P^ie | 

® wx(0 —» w — v 


Consider now a point e” which belongs to w — v. According to the 
definition ot w — v, the point e" must lie oviside of w\(t). Since Wx(t) is 
defined to contain all points e for which X’''(e) < t, it follows that at the 
point e" either Pb(«" \II) 9 ^ 0 and then 


X*(6") = 


Psie" I m 
Psie" 1 fl) 


^ t 



[5-3*2] 


308 BEST CRITICAL REGION 

or PbW I ^0 = 0 - In either case we have 

Vnie" 1 H) g | fl). 

A larger quantity is subtracted from the right-hand side of ( 6 -3-5) if 
pg[e I H) is replaced by 7 p*(e | ^f) for e in to — t) and hence 

t 


(5-3-6) 


■D ^ 7 12 Psie I H) - S P*(e 1 H) 
^ 7 52 Pif(e 1 ^0 - 7 52 Pit(e I 


The extreme right-hand side of (5•3-6) will not change if we add to it 
and subtract from it the same number 7 P^ie \ H). Hence, we may 

t 9 


(5-3-7) 


D^j\ Z PE(e\H)+ Ep^(e|^0l 

I Ltflxco-* * J 

-j[ZMe\H)+ZPBie\H)]. 


However, it is obvious that the two sums in the two pairs of brackets add 
up to 

Z Po(e|H) =P{^t«;x(«) |H} 

i»x(0 

and , 

ZPE{e\H) =P{Etw\H], 

tp 

respectively. Therefore (5-3*7) can be rewritten as 

I>^ 7 {E P*(«l^- 52Po(e|H)}, 

^ Vwx(0 w J 

or 

D ^ J [P{^ e «>x(0 I H} - P{P e to I H}]. 

Because of (5 • 3 • 3) and (5 • 3 • 4) the expression in square brackets is greater 
than or equal to zero. Thus, since t is positive, the difference D must be 
either positive or zero and the theorem is proved. 


S-3-2. Method of constructing best critical regions. The theorem just 
proved gives an easy method of constructing B.C.R.’s. A simple interpre¬ 
tation (tf this theorem is that the desirability of including a given point e 



[ 5 - 3 - 2 ] 


METHOD OF CONSTRUCTION 


309 


«o ^ 

• ^ 
lO 53 

H 

m *5* 
H i 



3 


W5 

lO T-l O 

no 

II 

889,568 

62,488 

1,458 

40 

1 

II 





lO 


li 

-J? 

iC lo 
lO 00 Q 
|> rH O 

II 

-A; 

00 00 CO CO 
(N '*t< 

Oi Ol 


cc 

CO 



si 


CO 

lO 1-H Q 
(N CO Q 

lO ^ o o 

CO 

II 

II 



■Jc 


CS| 1-1 

»c 


II 

75 

187,5 

005,208 

000,145 

CM 

II 


2?S?® 

r- M 



oo" 



00 

lO o ^ 

Cl 00 o 



S CO 8 S 
lO ^ o o o 

11 


00 d 1-1 

lO ^ 



tH 


o 

243. 

6.75 

.187,5 

.005,208 

.000,145 

.000,004 

O 

-itf 


lO Tji CO CM ^ O 

II II II II II II 

c« C4 n 

^ ^ ^ ^ ^ ^ 









310 


BEST CRITICAL REGION 


[ 5 - 3 - 2 ] 

in the critical region depends on the value of the criterion X*(e); the smaller 
the value of X*(e) the more desirable it is that e should be within the 
critical region. Therefore, given the frequency function 'Pb{±\ H) specified 
by the hypothesis tested^nd the frequency function pjs{e | H) specified by 
the simple alternative H, in order to construct the B.C.R. for testing H 
against H we may proceed as follows. __ 

(i) Compute the quotient X*(e) = psie | HJ/psie | H) for all the points 
forming the sample space W for which psie \ H) > 0, 

(ii) Arrange and number the possible sample points e in the order of 
increasing values of X*(e), so that for ei, Ca, Ca , • • • yCif, • • • 

X*(ei) g X*(6a) ^ g X*(ei,+0 g . 

If two or more points, say e', c", e'" correspond to the same value of 
X*(c), so that X*(eO = X*(e") = X’*‘(e"') = • • • , then the order of these 
particular points is immaterial. 

(iii) Include in the critical region W\{t) enough points , Cg, • • • , % , 
to conform as closely as possible to the preassigned level of significance a. 

Table 5*3 gives the numerical values of the frequency function of the 
observable random variables as specified by the hypothesis tested H, 
Table 5*4 gives the numerical values of the frequency function of the 
same random variables specified by the alternative h 3 rpothesis. Using these 
tables, we perform step (i) and compute for each possible sample point e the 
value of the criterionJ^*(c) (of the X*-criterion, for short). For a few points 
for which Pxi.x.(« 1 H) is so small that Table 5*4 lists them as equal to 
zero, the computation of the^*-criterion requires a more precise evaluation 
of the probability pxx,x,(^ I H). The values of the X*-criterion are given in 
Table 5-6. 

It is seen that there are several groups of points for which the value of 
X*(e) is exactly the same. Thus, the points within each such group may 
be ordered in an arbitrary way. In performing step (ii) we keep track of 
this circumstance and connect the interchangeable points of constant X*(e) 
with a vertical bar or bracket extending over all points of each group. 
The first column of Table 5*7 gives a tentative marking of the possible 
sample points, the second column gives their coordinates, the third the 
corresponding value of the criterion X*(e), the fourth the value of 
Pxx.xti^i f ^ 21 H), and the fifth the cumulative sum of these values. 
Finally, in order to decide on the last point to be included in the region 
it is convenient to have before one's eyes the power of the test. For 
this purpose it is useful to add two more columns, the sixth and the seventh 
giving, respectively, the values of pxx,x,{ki , A?a | ) and the cumulative 

sums of these values. 

It is seen that, if the level of significance is set to be a = .001, then the 
B.C.R. must include some points with the corresponding value of the 
criterion X*(e) « .03125 = t. It follows that the B.C.R. must include all 



SHORT CUT METHOD 


311 


[5-3-3] 

the points with the value of \*(e) < .03125, i.e., the points Ci, 62 , • • • e* . 
On the other hand, out of the group of three points for which X*(e) =* 
.03125 we can make an arbitrary selection with the resulting probability 
of an error of the first kind combined with the power of the test being 
the only consideration. Since the difference between the preassigned 
a = .001 and the probability of a first kind of error .001,052 is probably 
too small to matter in any practical problem and since laxity in this respect 
means some gain in the power, we choose to include in the critical region 
W\ (.03125) all the points marked \dth asterisks. This is, then, a best 
critical region for testing H against H with the probability of first kind 
of error equal to .001,052. Its power is jS = .858,023. It follows from the 

Table 5-7 


ConstriLcHon of the B.C.R, 



ki , k 2 

X*(e) 

PXt . xA ^ uki \ H ) Cum. Sums 

PXx . xS ^ iyk %\ lI ) Cum. Sums 

ei* 

(0,0) 

.000,004 

.000,001 

.000,001 

.193,492 

.193,492 

€2* 

(1,0) 

.000,024 

.000,003 

.000,004 

.107,495 

.300,987 

ei * 

(2,0) 

.000,145 

.000,003 

.000,007 

.023,888 

.324,875 

64* 

(0,1) 

.000,145 

.000,035 

.000,042 

.241,865 

.566,740 

66* 

(3,0) 

.000,868 

.000,002 

.000,044 

.002,654 

.569,394 

66* 

(1,1) 

.000,868 

.000,117 

.000,161 

.135,369 

.703,763 

67* 

(4,0) 

.005,208 

.000,001 

.000,162 

.000,147 

.703,910 

68* 

(2,1) 

.005,208 

.000,156 

.000,318 

.029,860 

.733,770 

69* 

(0,2) 

,005,208 

.000,630 

.000,948 

.120,932 

.854,702 

610* 

(5,0) 

.031,250 

.000,000 

.000,948 

.000,003 

.854,705 

611* 

(3,1) 

.031,250 

.000,104 

.001,052 

.003,318 

.858,023 

612 

(1,2) 

.031,250 

.002,100 

.003,152 

.067,185 

.925,208 


theorem on B.C.R.'s that, whatever other region w is suggested with 
probability of first kind of error not exceeding .001,052, the corresponding 
power will be either less than or, at most, equal to .858,023. 

5*3*3. Short cut in constructing the B.C.R. The procedure of constructing 
the B.C.R. described above involves a considerable amount of tediou^ 
arithmetic, and it is appropriate to see how at least some of the computa¬ 
tions could be avoided. One such possibility is almost obvious. 

The bulk of computations involved consists of (a) computing the table 



312 


BEST CRITICAL REGION 


[ 5 * 3 - 3 ] 

of values of pa(c | H) for aU possible sample points; (b) computing a 
similar table of values of -psie \ H), again for all points of the sample space 
W', (c) computing the table of values of X*(e) for dll points of W. When 
all this is completed, a considerable number of values obtained in steps 
(a), (b) and (c) appear to be of no further use because they correspond to 
points with too large values of X*(e) to be included in the critical region. 
It follows that the amount of labor could be considerably reduced if we 
could find a method of sorting the possible sample points e according to 
increasing vdues of X*(e) without computing the probabilities p*(e j H) 
and p*(e I /f) and without calculating the values of X*(e). Once this sorting 
is achieved, the B.C.R. can be determined by computing pj[(e | H) and 
p*(e I H) and their cumulative sums for only those few sample points 
which are known to correspond to the smallest values of X*(e). 

The sorting of the possible sample points according to the values of 
X*(e) frequently is very easy when one takes into account the formulae 
which determine the probabilities psifi \ H) and psie \ H) as functions of 
the coordinates of the point e. Substituting these formulae into the in¬ 
equality 

x.(.) = Efif a , 

Psie 1 H) 

which determines the B.C.R., we obtain an inequality in terms of the co¬ 
ordinates of Cy and all that remains is to simplify this inequality in a 
manner explained in detail in courses on algebra. We will illustrate the 
procedure on the example of subsection 5*2-7, involving two observable 
random variables. 

Copying formulae (5*2-8) and (5-2*9), and writing Ui and rtz instead 
of 6 for the number of applications of the two tests Ti and Tg , we have 

Vx.,xXh ,k,\H) = C-JlpJKl - P,*)”‘-*‘Ci:pj5(l - P22)"-*- 

and 

Pz..x.(fcx ,*21^ = ci;pt;(i - pu)"‘'**cJ:pJi(i - p,.)"-*’ 

for All = 0,1, 2, • ■ • , ni, and A :2 = 0,1, 2, • • • , na . Thus the X*-criterion 
is the following function of ki and ikj 

X*fi: k '\ = I ^ * ‘^22(1 ^22)'** 

Px..xXk^,h\H) ptl(l - Pxxr"*‘P^;(l - P20"-** ‘ 

Since we are interested in the relation between X*(A:j , fcj) and the co¬ 
ordinates ki and Aii , it is expedient to sort out the particular factors in 
thb right-hand side and combine all those which are raised to the power 
Ast or . As a result we obtain the following formula 



[ 5 - 3 - 3 ] 


SHORT CUT METHOD 


313 


(5-3-8) 

Pl2 ^ Y l P22 ^ * ( V\2 1 PlA */ V 22 1 P 21 V* 

\1 - Pii/ \1 - P21/ \1 ~ P12 Pll / \1 V22 V2I / 


Now, it is obvious that to select values of fci and ^2 so as to ascribe small 
values to X*(A;i , means in effect to select k^ and k 2 so as to ascribe 
small values to the product of the two last factors in the right-hand side 
of (5*3-8). Thus, instead of trying to find values of k^ and k 2 to satisfy 
the inequality X*(A;i , fcj) ^ where t is some positive number, we may 
try to satisfy the inequality, say 

(5-3*9) g ti 

where ti is a positive number depending on the original t and where for 
the sake of brevity in writing, 


(5-3-10) 


_ Pi 2 i__ZL2ii, 

f P 12 Pii 


V 22 1 P21 

1 V22 P21 


Further, to make ^ it is obviously necessary and sufficient to 

make the logarithm of the left-hand side of (5*3*9) ^ log ti or 


ki log a -h A :2 log ^ log ^1 = ^ 2 , say. 


Since this last inequality is equivalent to the original X*(A:i , ki) ^ it 
follows that, to sort the possible sample points in order of increasing 
X*-criterion, it is sufficient to sort them according to a much simpler 
criterion, say 

(5-3*11) L{ki , ki) = ki log a + ki log 6, 

and this can be done easily without the previous computation of 
Px^,x,(e I H) and px^.xA^ I H). 

Substituting pn = .1, pzi = .2, pi 2 = A and pa = .9 into (5-3-10),, 
we have 






= 36 = 6". 


Noticing that log 36 = 2 log 6 we reduce formula (5-3-11) to 
L{ki , ki) = {ki 4- 2ki) log 6. 

Since log 6 > 0, it appears that the simplest criterion for classifying 
the possible sample points is, say, 

■^1 (^1 f ^2) ~ ”1” 2A/2 • 

The smaller the value of Li , the smaller is the value of the X*-criterion. 

t 



314 BEST CRITICAL REGION [ 5 * 3 * 3 ] 

Using the criterion Li , the ordering of the possible sample points is very 


easy and proceeds as illustrated below. 

t% (&i f ki) TjiiJci f fcj) 

«1 ( 0 , 0 ) 0 

ei (1,0) 1 

l«i (2,0) 2 

k (0,1) 2 

U. (3,0) 3 

k (1,1) 3 

«7 (4,0) 4 

«8 (2,1) 4 

«. (0,2) 4 

etc. 


From now on, the procedure in determining the B.C.R. consists of 
computing the probabilities px,.x,(^i , h\H) and px^.x.(kt , A;* | ff) for 
a number of points ei , , * > * and evaluating the cumulative sums as 

explained in the preceding subsection. 

In summing up the contents of this subsection, we may say that the 
process of determining the B.C.R. for testing the simple hypothesis H 
against a single simple alternative H consists of writing down the basic 
inequality 

X*(e) ^ i, 

substitoting in the formula for X*(e) the expressions of p*(e | H) and 
pg(e I H) in terms of coordinates ki , kt , • r • of the possible sample point 
e, and simplifying the resulting inequality so as to bring it into the form 

(6-3-12) L(/!i , fca , • ^ 

where L(fci, fc*, • • •) is a function of the k'a which is simpler to compute 
than X*(e). Then the possible sample points are arranged in the^order of 
increasing values of the criterion L{ki, kt, • • •)• Enough of these points, 
beginning with the first, have to be included in the B.C.R. to comply 
with the requirements concerning the level of significance. 

The criterion L{ki ,k 2 , • • •), which is equivalent to the X*-criterion but 
simpler, may be called the “working form of the X*-criterion.” The X*- 
criterion itself is so defined by formula (5-3-2) that its construction is 
automatic. On the other hand, there is no unique working form 
L(ki , fc* , • • •) so that the reader is free to select any function he finds 
convenient provided the inequality (5»3-12) is equivalent to the in¬ 
equality X^Ce) ^ t. It will be noticed that, to satisfy this requirement, it 



[ 5 * 3 * 3 ] PROBLEMS AND EXERCISES 315 

is necessary and sufficient that the working form L{ki , fca > • • •) be a strictly 
increasing function of the X*-criterion X*(e). 

Let e' and e" be any two possible sample points with coordinates A{ , 
/ba , • • • and fcj', • • • , respectively. The function L{ki , fta , • • •) is a 
strictly increasing function of X*(e) if the inequality 

(5-3-13) X*(eO <X*(e") 

always implies 

(5-3-14) Uk[ , , • • •) < , Jfci' , • • •) 

and if also (6-3*14) always implies (5*3*13). 

PROBLEMS AND EXERCISES 

1. Follow the general ideas leading to the definition o^the B.C.R. for 
testing a simple hypothesis H against a simple alternative , andjormulate 
the definition of the worst critical region for testing H against H, 

2. With reference to the above problem, prove that the critical region 
w^{t) defined to include all possible sample points for which X*(c) > t and 
none of J^ose for which X*(e) < < is a worst critical region for testing H 
against H. 

3. School officials have noticed that when spinach is served as part of 
the school lunch, the probability that a child chosen at random will eat 
his spinach is .5 and that one child’s action seems to be completely inde¬ 
pendent of that of all of the others. A salesman of frozen spinach claims 
that 80 per cent of the children will eat his brand of spinach, which, however, 
is slightly more expensive. In order to test his contention, the new brand 
of spinach is served to five children. Assume independence and let X equal 
the number of children who eat spinach. 

(a) What is the observable random variable? What are its frequency func¬ 
tions under the two hypotheses considered? 

(b) What are the two errors possible in testing the salesman’s contention? 
What, then, is the hypothesis H to be tested? What is the alternative /f? 
What level of significance a do you recommend? 

(c) Usejhe short cut method to find the best critical region for testing H 
against H, Show all reasoning. Use the level of significance you recommend. 

(d) Compute the probability of an error of the first kind. 

(e) Compute the probability of an error of the second kind. 

(f) Compute the power of the test. 

(g) If X = 3, is -ff rejected or accepted? Why? 

4. In order to test whether or not a certain disease is infectious, a biologist 
infects five mice with the disease and puts them into a cage together with 
five other mice. He considers that if the disease is infectious then all of the 
non-injected mice will show definite signs of the disease within t^en days. 



316 


BEST CRITICAL REGION 


[5-3-3] 

However, if the disease is not infectious then the probability that a mouse 
will contract it within ten days is only .1 and this is independent of whether 
or not the other mice have the disease. Let X equal the number of non- 
injected mice who show signs of the disease. 

(a) What is the observable random variable? What are its frequency func¬ 
tions under the two hypotheses considered? Notice that the frequency 
function under the hypothesis that the disease is contagious is a very special 
case. 

(b) What are the two errors possible in testing contagion? What, then, is 
the hypothesis H to be tested? What is the alternative H? What level of 
significance a do you recommend? 

(c) Use_^e short cut method to find the best critical region for testing H 
against H at the level of significance you recommend. Show all reasoning. 

(d) Compute the probability of an error of the first kind. 

(e) Compute the probability of an error of the second kind. 

(f) Compute the power of the test. 

(g) If X = 3, is H rejected or accepted? Why? 

6. You are a research technician for a producer of insecticides. The 
performance of a certain type of insecticides is measured by the “per cent 
kill”; that is, by the per cent of insects in a test group which is killed by 
application of the insecticide. The best developed to date, we shall assume, 
has an average performance rating of 70 per cent kill. A new product NIP 
is developed in the laboratory and the question arises as to whether or not 
NIP is better than the old insecticides. 

If the new insecticide NIP has a per cent kill of 75 or better, the producer 
wants a good chance—say, probability .8—of recognizing that NIP is 
better than the old insecticides. Consider an experiment based on applying 
the new insecticide to a set of N insects and counting the number X which 
are killed. 

(a) What are the admissible statistical h 3 q)otheses? 

(b) Which of these would you choose for the hypothesis to be tested? Why? 

(c) What is meant—^in terms of the insecticide—by 

(i) “the probability of error of the first kind is .05”, 

(ii) “the level of significance is .05”? 

(d) Consider the hypotheses: per cent kill = 70 and per cent kill = 75. 
Use the short cut method to find the criterion only for the B.C.R. with a = 
.05 and satisfying the producer’s requirement. What is the first point to go 
into the critical region? How is the last point to go in determined (do not 
do arithmetic)? How would you check that the producer’s requirement is 
satisfied? 

6. The genes g and 0 are such that in order to distinguish the recessives 
gg from the two other types, gG and GG, the plants must be grown in special 
oonifitions C. If the plants are grown in the field, there is no distinction 
between jthe genetical types with respect to g and G. The plants multiply 



[ 5 * 3 * 4 ] PROBLEMS AND EXERCISES ' 317 

according to panmixia, and it is known that the proportions of the three 
genetical types, dominants, hybrids and recessives, are p = .16, g = .48 
and r = .36, respectively. From the seeds produced by a plant grown in the 
field a sample of n = 4 seeds is taken at random and grown under the special 
conditions C. Let X denote the number of recessives observed. 

(a) Enumerate the three possible hypotheses about the plant. 

(b) Compute the P{X = k} for each of these h 3 rpotheses. 

(c) Set the level of significance at a = .2 and find the most powerful 
critical region to test the hypothesis H that the plant is a recessive gg against 
one of the alternatives. 

(d) Against the other alternative. 

(e) Compute: The probability of first kind of error, the probability of 
second kind of error and the power for each of the tests (c) and (d). 

(f) Is the same region a B.C.R. for testing H against both of the al¬ 
ternatives? Why? 

__ 7. Invent an example of a simple hypothesis H and a^mple alternative 
H where there are two B.C.R.^s for testing H against Hj say w' and w", 
both giving the same probability of an error of the first kind, namely 
a' = .05. Can it be arranged so that the power of is greater than that 
of u?"? Why? 

8. Suppose that w' and are two different RC.R.^s for testing a 
simple hypothesis H against a simple alternative that they both de¬ 
termine the same probability a' of an error of the first kind, and that the 
difference between w' and w'' consists of the fact that w' contains possible 
sample points ei and ei which are not included in while w'^ contains 
possible sample points e[', and which are not included in w\ What 
can b^said about the values of the frequency functions ’p«{e\H) and 
Ve{^ 1 H) at the five points men^oned? For example, can one assert that 
necessarily \ H) = psie'i' | H)? 

9. Prove that, if the single hypothesis tested H differs from the al¬ 
ternative simple hypothesis Hy and if there are some points e' in the sample 
space for which the criterion X*(e') > 1, then there must be some other 
points e" for which X*(e") < 1. 

I 

5* 3* 4. Distribution problem in testing statistical hypotheses. The short 
cut described in subsection 5*3*3 does lead to a considerable simplification 
of the process of determining the B.C.R. compared with the original steps 
of subsection 5*3-2. However, upon working out a few examples, the 
student will notice that a certain phase of the process is still very tedious. 
This phase is the evaluation of the probabilities | //) and p^(e | H) 
and their cumulative sums for a number of possible sample points Ci ., 
^2 , • • • , arranged in the order of increasing values of the X*-criterion. 
The problem of simplifying this phase of the process is frequently very 
difficult and is known as the distribution problem. Right now we are con- 



318 * TESTS OF HYPOTHESES [ 5 * 3 * 4 ] 

cemed with the distribution problem connected with tests statistical 
hypotheses. However, exactly similar problems of distribution arise in the 
study of all kinds of rules of inductive behavior, not only in tests- 
To explain the nature of the distribution problem we will consider the 
process of determining a B.C.R. from a slightly different point of view 
from that described above. Let L(e) stand for the adopted working form 
of the X*-criterion. Using L(e) we sort and number the possible sample 
points in the order of increasing values of L(e). Thus we find the point of 
tile sample space for which the value of L{e) is the least and label it et . 
Frequently there is just one such point, but in some cases there are several 
of them, say mi, and then we order them arbitrarily and number them 


e,, ej, • • • , . 

For all these points the criterion L(e) has the same value, say ti, 
L(ei) — L(e») = • • • = L(e«.) = h . 


Proceeding further, we notice that the next largest value of L(e) is, say, 
it and that there are one or more, say wij , possible sample points for 
which L(e) is equal to tt . These m, points are ordered arbitrarily and 
numbered 

^mt+l , ^mt+3 1***1 


We have 


L(«*. + l) = L(em,-n) = • • • = L(em,+*,) = <»><!, 

and so forth. The next step leading to the construction of the B.C.R. 
consists in computing the cumulative sums of probabilities ps(e | H) for 
all possible sample points arranged in the order just described. In partic¬ 
ular, we compute the following sums 

iFij mi+fn* 

(6-3-16) l2p»(e* I ^0i 22 P*(«* 1 ^1 *** 

*-i *-i 

and continue until we surpass the chosen value of the level of significance 
a. In other words, we terminate the computation of the cumulative sums 
(5*3*15) after arriving at two consecutive values of L(e), say tr and > t, 

such that 

mi+mi+m»+ms+••^mr-hwir+i 

(5*3*16) S p*(e» 1 H)< a ^ 2 P*(c* I «)* 

Then, the B.C.R. is defined to contain all the rui + ma • • • -|- m, 
first possible sample points classified according to the value of L(e) and 
some or all of the subsequent m,+i points. Upon examining this process 
more closely, especially in connection with some of the examples treated 
above, the reader will notice that, with two or more observable random 



DISTRIBUTION PROBLEM 


319 


[ 5 - 3 - 4 ] 

variables each capable of assuming moderately large numbers of possible 
values, it is frequently not important whether the B.C.R. includes all the 
Wi + m 2 + • • • + mr + mr+i first possible sample points for which 
L{e) ^ <r+i or whether some of the last mr+i points are left out. This is 
a consequence of the obvious fact that, if the sample space W contains 
some 100 or more points then, in^many cases, the probabilities of these 
points determined by H and by H are very small and the difference be¬ 
tween the extreme members of the formula (5*3* 16) is too small to matter. 
Therefore, unless the problem considered is somewhat exceptional or unless 
the statistician is very meticulous in adhering to the preassigned level of 
significance, the determination of the B.C.R. may be simplified by making 
a rule of including all the points in the sample space such that the corre¬ 
sponding value of the working criterion L(e) ^ fr+i , where fr+i is deter¬ 
mined by formula (5 • 3 • 16). 

Now we come to the new point of view on the process of determining 
the B.C.R. This consists of considering the definition of the working 
X*-criterion L{e) as the definition of a new random variable, say L{E)^ a 
function of the observable random variables E = {Xi , X 2 , • • •)• The 
explicit definition of L(E) is as follows: whenever the observable random 
variables E = (Xi y X 2 , •*•) assume values determining a possible sample 
point e', the random variable L(E) assumes the value L(e'). According to 
what was said before, this value L(c') is necessarily one of the numbers 
ti < t 2 < • • • < tr < • • • . Let be one of these numbers. Then it is a 
possible value of L(E) and the probability determined by the hypothesis 
II that L{E) will assume this value is obviously 

Wi+OTa + ••• + »»* 

(6-3-17) P{L(E) = E p^ie, \ H). 

«i+ •••4-m* —1 + 1 

Similarly 

___ »>i+tn t + * **+tn» 

(5.3-18) P{L{E) = 1 H} = E p«(e* | H). 

ifc-*»»i+ • •• + »?»« —1 + 1 

Referring to subsection 4* 1-2, the reader will realize that the probabilities 
(5-3*17) ^d (5-3-18) are values of the frequency functions, say piit | H) 
and PiXii H)y of the random variable L(E) as defined by the two hypotheses 
H and Hy respectively. Also the cumulative sums (5*3*15) are the values 
of the distribution function, say \ H)i of the newly defined random 
variable L(E), Using these concepts, formula (5-3*16) defining U may be 
rewritten as 


P{L(E) ^tr\H} <a^ P{L{E) ^ \ H] 

or as 

F^(tr\H)’< a ^ F^{tr^,\H). 



320 TESTS OF HYPOTHESES [5^3-4] 

The definition of fr+i is formulated as follows: tr+i is the least possible 
value of the working X*-criterion such that the corresponding value of the 
distribution function of L(E) is equal to or greater than the preassigned 
level of significance a. 

With the above definitions, the process of determining the B.C.R. for 
testing a simple hypothesis H against a simple alternative H reduces to 
the following steps: 

1. Write down the formula for the X*-criterion and simplify it to obtain 
the working criterion L(e). 

2. Consider the random variable L(E) obtained from L(e) by substi¬ 
tuting the (random) sample point E instead of the possible sample 
point e and deduce the distribution functions H) and Fiit | H) 
of L{E) as determined by the hypothesis H and H, respectively. 

3. Find the least value, say ta , of L{e) for which | H) ^ a. 

Then the B.C.R. for testing H against H is defined to include all points 
of the sample space in which L{e) ^ • The probability of an error of the 

first kind m using this critical region is equal to | H) and the power 
to Flit a I JEf). In order to apply the test in practice, it is sufficient to obtain 
the observations and compute the corresponding value of the working 
criterion L{e), If the observations yield E = e', then we compute L(e') 
and compare it with ta . If E{e^) ^ ta then the sample po^nt £? = e' is 
within the critical region and the hypothesis H is rejected. Otherwise, 
i.e., if L(e*) > ta , the hypothesis H is accepted. 

It is seen that this simple procedure depends on the possibility of per¬ 
forming Step 2, i.e., on the possibility of_jdeducing formulae giving the 
distribution functions 1 H) and Fl(< | H), This is just the distribution 
problem mentioned in the title of the present subsection. In general terms 
it can be stated as follows: 

Given several random variables E = {Xi , Xz, • • • , X„) and their frequency 
function, say PE{e), and also given a random variable Y defined as a specified 
function of the random variables E = (Xi , Xa , • • • , XJ, so that 

Y = m, 

deduce a formula giving the distribution function Frit) of Y, 

Fy{t) = P{Y g t}. 

In the case of testing statistical hypotheses just described, the random 
variable Y is either the X*-criterion itself or its working form L{E), As 
mentioned, the distribution problem is frequently very difficult in the 
sense that the resulting formulae are too complicated to be used in practical 
work. When one is faced with the insistent need of practical applications, 
one usually resorts to the following expedients. In certain cases, when a 



[ 5 * 3 * 5 ] PROBLEMS AND EXERCISES 321 

given test is in frequent use, an effort is made to compute a table of the 
frequency function Fk( 0, which is then published. The tables of the Normal 
integral are an example of this sort. In other cases, instead of using the 
requisite distribution function Fy(0, the statistician is forced to use some 
approximation to Fyit) which is simpler than Fyit) itself. Thus, for ex¬ 
ample, instead of using the cumbersome distribution function of the bi¬ 
nomial variable we frequently use the Normal approximation as described 
in subsection 4*5-5. 

A few simple distribution problems arc discussed in Volume Two of this 
text. 


PROBLEMS AND EXERCISES 

1. Tables 5*3 and 5*4 give the frequency functions of two random 
variables Xi and X 2 . Use these tables to compute the frequency functions 
of the random variables F, Z, and U defined as follows 

y = 2X, + , 

z = + X,, 

U = Xi + 2 X 2 . 

Make graph of the frequency functions computed. Obtain and plot the 
distribution functions. 

2. Let Xi and Xg be two binomial variables defined as follows: Xi is 
the number of successes in a sequence Si of rii completely independent 
trials with probability of success equal to p. Xg is the number of successes 
in another sequence S 2 of 712 completely independent trials with prob¬ 
ability of success equal to the same number p. Assume that the trials of 
Si are completely independent of the trials of S 2 and deduce the frequency 
function of the random variable F = Xi + Xg. 

Hint: the problem may be solved by direct computation upon noticing 
'that in the circumstances described 

P{iXi = *0(^2 = k2)\ = P{Xi = ki]P{X2 = A,} 

for Jki = 0, 1, 2, • • • , ni and A ;2 = 0, 1, 2, • • • , nj . Another and more 
elegant way of obtaining the same result (applicable only when the prob¬ 
abilities of success in the two sets of trials are equal) is through an effort 
at defining F not as the sum of two binomial variables but rather in terms 
of the successes in the various trials. The student should solve the problem 
by both methods suggested. 

5-3-5. Standard family of B.C.R.’s. It is important to realize that, while 
the critical region defined in the theorem on best critical regions is 
necessarily a B.C.R. for testing H against a simple alternative H, there 



322 


BEST CRITICAL REGIONS 


[ 5 - 3 - 5 ] 

may exist best critical regions which do not conform to the definition of 
tOx(0. In other words, the critical regions designated by the symbolji^(f) 
need not exhaust all the best critical regions for testing H against H. To 
see this, consider three possible sample points, e', e", and e'", which are 
the least probable according to the hypothesis tested H, so that, whatever 
other possible sample point e, 

(5-3• 19) 0 < vuie' I H) < I H) < p^ie'" \ H) < pa(e | fl). 

Assume further that 

(5 • 3 • 20) p,(e' I ^ + PK(e" \ H) < p,{e'" \ H) 

and that 

(6-3-21) p,(e' I H) > p*(e' | ^ > 0, 

(5-3-22) pE(e" I H) > p^ie" \H)>0. 

Obviously, the assumptions made have nothing unusual in them, and the 
reader a^I have no difficulty in inventing numerical examples of hypotheses 
H and H in which these assumptions are satisfied. Consider the regions 
v>', v>" and w"', where w' is defined to contain just one possible sample 
point e', the region w" is defined to contain only e”, and w'" is defined to 
contain exactly two possible sample points e' and e”. We shall show easily 
that both w' and w'" are B.C.R.’s for testing H against H and that they do 
not conform to the definition of Wx(t). 

The region is a B.C.R. with the probability of error of the first kind 
equal to, say, 

o' = p«(e' 1 H). 

The reason is that, since on the hypothesis H the point s' is the least 
probable possible sample point, whatever critical region w we take, if it 
contains possible sample points at all and differs from v/, the corre¬ 
sponding probability of the error of the first kind must be greater than 
of. Thus, there exists but one critical region with probability of error of 
the first kind not exceeding a' (namely to') and its power does not exceed 
that of to' (because it is equal to the power of v>'). Hence v/ satisfies the 
definition of a B.C.R. 

While the above reasoning does involve some artificiality, there is no 
artificiality in the assertion that w'" is a B.C.R. To prove that w'" is a 
B.C.R. we notice first that the probability of a first kind of error corre¬ 
sponding to w'" is, say 

= p,(e' 1 H) + P*(e" 1 H). 

Because of (5*3*20) and (5*3*19) it follows that there axe just two regions 
different from iv"' for which the probability of error of the first kind does 



STANDARD FAMILY 


[ 5 - 3 - 5 ] 


323 


not exceed a'". Namely, they are the regions ti?' and te;". The powers of the 
three regions w\ and w'" are 


= p^{e' 1 ff), I H), and 


= PE(e' I H) + pi,(e" I H) 

and, because of (5-3-22), it is obvious that P{w') and j8(iy"') > 

Thus, is a B.C.R. for testing H against H with the probability of 
error of the first kind equal to a'". 

There remains to be shown that neither w' nor la'" conforms to the 
definition of Wx{t), For this purpose we notice that, because of the in¬ 
equalities (5-3-21) and (5-3-22), the values of the X*-criterion at points 
e' and c" are greater than unity. Therefore, if w'" were a best critical 
region of type Wx{t)y it would contain all the possible sample points e for 
which X*(c) < 1. Such points certainly exist (see subsection 5*3-3, Prob¬ 
lem 10) because, otherwise, the probabilities of all possiWe sample points 
computed on the hypothesis H and those computed on H could not both 
add up to unity. However, to'" does not contain any possible sample points 
other than c' and c". Therefore it does not conform with the definition of 
Wx(t). Neither does w\ 

As a result of this discussion we come to the conclusion that the theorem 
on best critical regions gives us means of constructing only B.C.R/s of a 
special category or, as we shall say, of a special family. This family includes 
a considerable number of the B.C.R.^s corresponding to a series of prob¬ 
abilities of the error of the first kind which increase by steps from the 
lowest value, which is ordinarily quite small, up to unity. Among these 
critical regions it is usually possible to select one whose probability of error 
of the first kind is sufficiently close to the prescribed level of significance. 
Because of the easy method of construction (and also for certain other 
reasons), the family of best critical regions Wx{t) plays an important role 
in the theory of testing statistical hypotheses and it is well to attach a 
special label to it. We shall call it the standard family of best critical regions. 

Definition 5*9. We will use the expression standard best crit^al region 
for testing a simple hypothesis H against a simple alternative H to denote 
every region Wx{t) defined to inchide all possible sample points e' for which 
the -criterion X*(e') < /, none of those possible sample points e” for which 
X*(c") > t and any number of those points e'" for which X*(e"') = L The 
family of all such regions Wx(t)f corresponding to varying values of tj> 0, 
will be called the standard family of the B,C,R,^s for testing H against H. 

PROBLEMS AND EXERCISES 

1. Consider Tables 5*3 and 5-4 which give the values of the frequency 
functions of two random variables Xi and X 2 , as defined by the hypothesis 



324 ' TESTS OF HYPOTHESES [5-4*l] 

tested H and by a simple alternative IL Determine a nonstandard best 
critical region w for testing H against Can w contain more than one 
point? 

2. A player in a card game brings a deck which he claims to consist of 
20 cards, namely, 4 aces, 4 kings, 4 queens, 4 jacks, and 4 tens. You think 
that there may be only 19 cards, one ace being gone, and decide to test this 
in an unobtrusive way. When your hand of eight cards is dealt, you will 
count Xj the number of aces among the eight cards and use the value of X 
to decide whether or not to continue in the game. 

(a) Compute the P\X = k] under the hypothesis Hi that the deck contains 
20 cards (4 aces) and under the hypothesis H 2 that the deck contains 19 
cards (3 aces). Verify that P[X = k \ Hi] = .102, .363, .381, .139, and .014 
for A; = 0, 1, 2, 3, and 4, respectively. 

(b) Is Hi or H 2 the hypothesis to be tested? Why? 

(c) Let the level of significance a = .25 and use the short cut method to 
find a standard B.C.R. for testing Hi against H 2 . Verify that the X*-criterion 
is 12/5(4 — k) and that the standard B.C.R. is Y = 0. What is the simplest 
working criterion L you can suggest? 

(d) Exhibit a more powerful critical region w than the standard B.C.R. 
Why are you able to do this? Is the existence of w a. contradiction of the 
fundamental lemma? 

(e) Compute the power of the regions found in (c) and (d). 

Perhaps you can obtain a more powerful test by observing both X, the 
number of aces, and F, the number of kings, in an eight card hand. 

(f) Write down the formula for P{(X = fc)(F = m)} under Hi and then 
under H 2 • 

(g) Again letting the level of significance a = .25, use the short cut method 
to find a standard B.C.R. for testing Hi against H 2 . Verify that the X*- 
criterion is again 12/5(4 — fc). Is this surprising? 

(h) Find the most powerful of the standard B.C.R.’s for which a is .25. 
Compute its power. Is there a more powerful critical region corresponding 
to the same level of significance? 

5*4. Test of a simple hypothesis against a composite 

alternative 

5*4-l. Uniformly most powerful tests. In the simplest case of testing 
hypotheses considered in Section ^3, where a simple hypothesis H is 
tested against a simple alternative H, the problem of the choice of a test 
is solved by the theorem on best critical regions. In fact, the use of the 
B.C.R. guarantees the control of errors of the first kind at approximately 
the preassigned level a and, at the same time, insures the minimum prob¬ 
ability of error of the second kind. Unfortunately, however, cases of 
testing simple hypotheses against simple alternatives are rare. 



[5’4‘1] UNIFORMLY MOST POWERFUL 325 

_^The more typical cases are those where either the alternative hypothesis 
H or both the hypothesis tested H and the alternative H are composite. 
In these circumstances the problem of the choice of appropriate critical 
region is less simple and involves many interesting ramifications. Some of 
these ramifications will be considered below. First of all, we may think 
of a direct extension of the concept of best critical region provided by the 
concept of uniformly most powerful critical region. 

As usual, let E stand for the set of observable random variables, 12 for 
the set of admissible simple hypotheses and H for the hypothesis tested. 
We will assume that H is a simple hypothesis and th^t 12 contains more 
than two elements, so that the alternative hypothesis H is composite. We 
will use the letter h to denote a simple hypothesis ^longing to II so that 
h is an element of 12 and is distinct from 11. Then H is the logical sum of 
all hypotheses h. Again, as formerly, the symbols | H) and pE{e | h) 
will be used to denote the frequency function of E specified by the 
hypothesis H and /i, respectively. 

The theorem on best critical regions gives an easy way to determine a 
standard B.C.R. for testing H against any single hypothesis h. However, 
for each hypothesis h there may be more than one B.C.R. guaranteeing' 
the same probability of error of the first kind. Generally, let w{h) denote 
a B.C.R. for testing II against a specified admissible simple hypothesis h 
and let a' be the corresponding probability of the error of the first kind. 
Now consider two different simple admissible hypotheses /i' and /i", both 
distinct from and the corresponding B.C.R.^s, w{h') and w{h"). It is 
possible that every B.C.R. w{h^) is different from every B.C.R. w{h*'). 
In this case, there is obviously an embarrassment in choosing a critical 
region for testing II against both of the hypotheses h' and A" at once. On 
the other hand, if it happens that among the B.C.R.^s for testing H against 
h' there is one, say Wo , which is also a B.C.R. for testing H against /i", 
then the use of the critical region Wq will guarantee a probability of error 
of the first kind equal to a' and insure the minimum probability of error 
of the second kind, irrespective of whether h' or /i" happens to be true. 
Hence, no better critical region than Wo can be suggested. These con¬ 
siderations lead to the following definition. 

Definition 5-10. We say that a critical region is a uniformly most 
powerful critical region (U.M.PJJ.R.) for testing the simple hypothesis H 
against a composite alt^native H if, whatever be the simple admissible hy¬ 
pothesis h belonging to //, the region Wq is a best critical region for testing H 
against h. 

As already mentioned, given H, the simple hypothesis tested, and H, 
the composite alternative, a U.M.P.C.R. need not exist. The usual pro¬ 
cedure for looking for a U.M.P.CJft. consists of considering any particular 
simple hypothesis h belonging to H and in determining a standard B.C.R. 



326 


TESTS OF HYPOTHESES 


[ 5 - 4-23 

for testing H against h. We may denote this B.C.R. by vhit,K). Next we 
vary h, subst^ting for it in turn other simple hypotheses h% h”, etc., all 
belonging to H. If it happens that the original critical region Wx{t,h) has 
the property of a B.C.R. for testing H against any and all simple l^oth- 
eses belonging to H, then it is a U.M.P.C.R. for testing H against H. 

5-4-2. Search for 1J.M.P.C.R. in the case of a simple h]rpothesis con¬ 
cerning the binomial distribution. To illustrate the method of searching 
for a uniformly most powerful critical region outlined above, consider the 
case where there is a single observable random variable, say , known 
to be a binomial variable with frequency function generated by the ex¬ 
pansion of {q -h pu)“, where n is a known number but where the value of 
p = 1 — 9 is subject to doubt. In other words, we assume that the set fl, 
of admissible simple hypotheses, consists of all hypotheses h which assert 
that the frequency function of X, has the form, say, 

(6.4-1) Px.(A! 1 p) = Cjp*(l - p)"-* 

for fc = 0, 1 , 2 , • • • , n, and which ascribe to p some specified values between 
zero and unity. Let H be the hypothesis tested and let po denote the value of 
p specified by H. We will assume that po is different from zero and from 
imity so that among the admissible hypotheses are some which assert 
that p < Po and also some which assert that p > po . In these circum¬ 
stances the hypothesis H alternative to H is composite and asserts the 
frequency function (5.4.1) with p po . We will consider the problem 
of detemining a uniformly most powerful critical region for testing H 
against H. 

Following the method indicated in the preceding sub^tion, we select 
a particular admissible simple hypothesis h' belonging to H and determine 
standard best critical regions for testing H against h'. Let p' be the value 
of p specified by h'. The X*-criterion, which we shall now denote by 
\*(k, p'), has the form 



The ramplest working form of this criterion is, say, 

(6-4-2) L{k, pO = log X*(fc, pO — log (^“T^) = * log A 



[5*4*2] BINOMIAL VARIABLE 327 

with 



for short. It is obvious that sorting the possible sample points according 
to increasing values of X*(fc, p') is perfectly equivalent to sorting them 
according to increasing values of L(A, p')- 

We now come to a difficulty originating from the fact that log A may 
be positive or negative, according to whether A is greater or less than 
unity. If A > 1 and, therefore, log ^ > 0, then the smaller the value of 
k, the smaller the value of L(k, p')* On the other hand, if A < 1 and, 
therefore, log A < 0, then the smaller the value of k, the larger the value 
of L(k, p')- Simple algebra shows that A is greater than, equal to, or less 
than unity, according to whether p' is less than, equal to, or greater than 
Po . It follows, then, that if the hypothesis k' specifies a value p' of p less 
than Po , then the standard B.C.R.’s for testing H against h' are composed 
of the points with the least abscissae fc = 0,1,2, • • • , t(n), where the choice 
of t(n) depends on the chosen level of significance. On the other hand, if 
the hypothesis h' specifies a value p' of p which is greater than po , then 
the standard B.C.R.'s for testing H against W are composed of the sample 
points with the greatest abscissae, say n, n — 1, • • • , T{ri). Here again 
the choice of T{ri) depends upon the value of the level of significance. In 
the two cases, the B.C.R.^s differ sharply and, therefore, in the problem 
considered there is no uniformly most powerful critical region among the 
standard B.C.R.’s. 

The case just considered is known as the test of the hypothesis U 
against a ‘‘two-sided set of alternatives.By this we mean that the set 
contains hypotheses ascribing to p values on both sides of po , some of 
them smaller than po and some larger than po . However, in certain cases 
it is appropriate to consider “one-sided sets of alternatives.^^ 

The term is applicable to the set, say , of simple hypotheses asserting 
(5-4-1) and ascribing to p values within the limits po ^ p ^ 1- Also it 
applies to the set of hypotheses specifying the values of p between 
limits 0 ^ p g Po . The two-sided set 0 considered above is the sum of 
the sets and fiz. 

The above discussion of the case of a two-sided set of alternatives indi¬ 
cates that, for binomial variables, in the one-sided case eve^ standard 
B.C.R. for testing H against a simple hypothesis belonging to H is, at the 
same time, a uniformly most powerful critical region for testing H against 
the alternative H. 

If, for example, we assume the set fii as the^et of admissible simple 
hypotheses, then the alternative hypothesis, say Hi , asserts that p > po . 
Returning to (5*4-2) and remembering that p' > go implies log A < 0, 
we see that, whichever hypothesis h' belonging to Hi we choose to con- 



328 TESTS OF HYPOTHESES [5'4"3] 

sider, the corresponding standard B.C.R. must contain the point k = n 
and some of the succeeding points, n — 1, n — 2, • • • , T(n), where T(n) 
is chosen in accordance with the preassigned level of significance. Since 
this choice will be the same, whatever h', it follows that in this particular 
case the critical region composed of the points k = n, n — 1, , T{n) 

is the uniformly most powerful critical region for testing H against Hi . 
Similar reasoning applies to the case where the set of admissible simple 
hypotheses is Q 2 . 

The reader will remember that in studying the problem of the Lady 
tasting tea we mentioned the possibility of considering one-sided ^ts of 
admissible hypotheses. 

PROBLEMS AND EXERCISES 

1. Consider the situations described in each of the first three exercises 
of subsection 5•2-7, formulate a simple hypothesis .^which is tested in 
each case and the appropriate composite alternative H, and see whether 
or not a uniformly most powerful critical region for testing H against H 
can be found. 

2. Consider the situation described in Problem 3 of subsectioi:^5*2*7, 
the hypothesis H which is tested and the composite alternative H. It is 
indicated that for each value of pi between the limits h ^ Vi ^ the 
probability p* = f + 2(pi — ^). Drop this assumption and find some 
other functional relationship between pi and p* with which a uniformly 
most powerful test of the hypothesis H will exist. 

5*4*3. Use of the normal approximation in testing hj^theses relating 
to binomial variables. In subsection 5-4-2 we considered the general 
type of simple hypothesis relating to a binomial variable X„ . This is the 
hypothesis H which asserts that the value of p is po . We found that, if 
the set of admissible simple h 3 T)Otheses is one-sided, either £2, or £2, , then 
the standard uniformly mos^owerful critical regions exist. In tests of H 
against the alternative, say H, , which asserts that p > Po, the standard 
U.M.P.C.R.’s, say Wi , are determined by the inequality 

^ Tin). 

On the other hand, in tests of H against the alternative, say H 2 , which 
asserts that p < po , the standard U.M.P.C.R.’s, say , are determined 
by the inequality 

Xn ^ m. 

In both cases, the critical values tin) and Tin) have to be determined 
in conformity with the chosen level of significance a. Thus Tin) stands 
for the greatest integer such that 

^ Tin) I Po} ^ « 



[5’4*3] BINOMIAL VARIABLE 329 

and t{n) is the smallest integer such that 

P{Xn ^ t(n) I Po} ^ a. 

For any given value of n, the search for the exact values of t(n) and 
T(n) was illustrated above, particularly in the discussion of the problem 
of the Lady tasting tea. On that occasion the reader must have noticed 
that, although the computations are intrinsically simple, they are very 
tedious when n is only moderately large. When n ^ 50 and occasionally 
when n is somewhat above this limit, these computations can be avoided 
by the use of the Incomplete Beta Function TableSy due to Karl Pearson 
[11], with which in due course the reader will become acquainted. In the 
present subsection we will consider the method of using the Normal ap¬ 
proximation described in subsection 4*5-5 as a means of determining ap¬ 
proximate values of t(:n) and T{n), This method is particularly useful 
when, instead of determining the critical regions Wi or W 2 for given w and a, 
we are seeking to determine the value of n with which the power of these 
critical regions with respect to a given simple admissible hypothesis h 
has a preassigned value, say /?. It will be sufficient to consider only one 
of the two cases of one-sided alternatives, say the first. As an easy exercise 
the reader may wish to consider the other case. 

Consider, then, a simple hypothesis h ascribing to p a value p' > po . 
The power of the critical region Wi at the point p' is 

^(p' 1 wO = P{X„ ^ Tin) 1 p'}. 

The probability of an error of the first kind in using this region is, say 

a' = P{X„ ^ Tin) I po}. 

We consider the problem of determining n and Tin) so that a' is ap¬ 
proximately equal to a, the preassigned level of significance, and so that 
Pip' 1 Wi) is approximately equal to P, the preassigned power at p'. 

Using the theorem of Laplace (subsection 4*5*4) and assuming that the 
value of n sought is substantial, we conclude that the probability 

^ Tin) I Po} 

is approximately equal to the normal integral, say 
-j^ fe~^ dx = l- Gin), 

v27r •'ri 

where, according to the rule set forth in subsection 4-5*5, the connection 
between ti on one hand and r(n), po , and n on the other is 

Tjn) — ^ — npo ^ 

\/«Po(l - Po ) . 


(5-4-3) 



330 TESTS OF HYPOTHESES , [ 5 ’ 4 * 3 ] 


Similarly, P{Xn ^ T{n) | p'} is approximately equal to 1 — 0{t^ where 


(5-4-4) 


T{n) - I - ^ 

\/np'(l ~ p) 


We use the Tables of the Normal Integral and read off the values of n and 
T2 so as to have 

• 1 — G(ri) = a, 


1 - G{t,) = /?. 

Then, since po and p' are given in the conditions of the problem, equa¬ 
tions (5-4-3) and (5-4-4) form a system from which the two unknowns, 
n and r(n), can be computed. Easy transformations give 

(5-4-5) T{n) - i “ npo = Ti \/wPo(l - Po), 

(5-4-6) r(n) — § — np' = -\/np'(l ~ p). 

Upon subtracting (5-4-6) from (5-4-5) and upon dividing both sides by 
(p' — Po)\/w, we obtain 

(6.4-7) Vn - Vftd - P.) - r, Vp'(1 - P') , 

p - Po 

The requisite value of n is obtained from (5-4-7) by squaring. Then, from 
(5-4-6) 

(5-4-8) T{n) = npo + i + Ti Vnpo(l - Po). 

Ordinarily, the value of the right-hand side of (5-4-8) will not be an 
integer number. The difficulty is solved by rounding off the obtained value 
to the nearest integer. 

PROBLEMS AND EXERCISES 

1. Consider the problem of the Lady tasting tea of subsection 5-2-3, 
and determine the requisite size n of the experiment and the critical value 
of r(n) so that, approximately, the resulting test insures a level of sig¬ 
nificance a = .01 and power j3 = .9 if the tasting ability of the Lady is 
measured by p' == .75. 

2. One strain Si of a given plant blooms white while another strain S 2 
blooms red. It is generally accepted that the two strains differ in respect 
to only one pair of genes 0 and that the strain 81 is pure recessive gg 
while jSa is pure dominant GO, However, a geneticist suspects that there 
may be several nonlinked pairs of genes involved, with the white color 
always characterizing the recessive gene. The geneticist is in possession of 
a certain amount of seeds collected from self-fertilized hybrids of the two 
strains Si ^nd S^ and wishes to use these seeds to test his presumption. 



[5*4*4] PROBLEMS AND EXERCISES 331 

Specificially, n seeds will be planted and the number Xn of white-blooming 
plants counted. The hypothesis tested is H: the color of the blossoms 
depends upon only one pair of genes. On this hypothesis, the probability 
that one particular seed planted wM yield a white-blooming plant is 
po = ‘25. The alternative hypothesis H is that the number of independent 
pairs of genes determining the color of the flowers is m ^ 2. The prob¬ 
ability that a particular single seed planted will yield a white-blooming 
plant depends on m and is equal to, say, Pn, = (J)”". Obtain the B.C.R. 

Determine the number n of seeds to be planted and the details of the 
subsequent test so as to conform with the chosen level of significance 
a = .01 and so that the power of the test equals .85 if m = 2. Use the 
normal approximation to compute the power function for m = 2, 3, 4. 

3. Same problem as in 2 but with a = .05 and jS = .90. 

4. Consider Problem 2 of subsection 5-2*7 and compute the values of 
n and t such that (i) if X = Xo = 1, the probability of passing the water 
for production is approximately equal to a = .005, and (ii) if X = f, the 
probability of passing the water is approximately equal to .9. Compute 
the power function for a few values of X and make a graph. 

5. It is agreed between a seed producer and a customer that the price 
for a consignment of seeds will be determined after the following test. 
Two hundred and fifty seeds taken at random from the consignment will 
be planted. X denotes the number of plants showing a certain recessive 
characteristic. If X ^ 25 then the producer will receive a certain '‘normal^' 
price. Otherwise, if X > 25, then the price for the consignment of seeds 
will be somewhat reduced. This agreement is of a more or less permanent 
character and the seed producer decides to adjust his selection and breeding 
policy accordingly. In fact, by using the method of mass selection de¬ 
scribed in subsection 3-3-4, the proportion p of recessives in seeds put on 
the market may be reduced below any desired level. Consider X as a 
binomial variable and determine that value p with which the probability 
that the normal price will be paid for the consignment is (approximately) 
equal to .95. 

5*4«4. Hypotheses relating to the h]rpergeometric random variable. In¬ 
dustrial sampling inspection. One of the most important fields of appli¬ 
cation of mathematical statistics is industry geared to mass production. 
The idea seems to be due to one of the directors of the huge brewery in 
Dublin, Messrs. Arthur Guinness, Son and Co. Some fifty years ago, he 
happened to read a book on probability and thought that ‘‘there may be 
some money in it.’^ He called upon Mr. William S. Gosset, then a junior 
employee of the brewery, and suggested that he might go for a year or 
two and study under Professor Karl Pearson at the then unique center 
of statistical research and instruction at University College, London. 
Mr. Gosset proved to be an enterprising and talented young man and. 



332 


TESTS OF HYPOTHESES 


[ 5 * 4 - 4 ] 

after a relatively brief period of study, embarked on research and began 
to produce new theoretical results. Some of them were of considerable and 
immediate value to the brewery. Some other results, perhaps not so im¬ 
mediately important to the brewery, proved to be of large theoretical 
interest and, naturally, the problem of their publication arose. The brewery 
did not object to the publication of these results but thought it wise to 
insure that the publication could not be connected easily with the name 
of one of its employees, so that their competitors would not realize that 
statistics may have something to do with brewing. As a result, the scientific 
world was astounded by a series of papers in the journal Biometrika, many 
of first-class importance, published under the pseudonym “Student.” Why 
should a scholar hide his name? A wide range of speculation arose. One 
extreme was that “Student’s” real name was frivolous. Another extreme 
was the suggestion that the person hiding behind the pseudonym “Student” 
was the Prince of Wales, now Duke of Windsor, who at the time was 
Honorary President of the Royal Statistical Society. Some of the results 
obtained by “Student” are now classic and form an essential part of the 
more advanced study of statistics. 

The present expansion of statistical work in industry is largely due to 
Walter A. Shewhart [13], and to H. F. Dodge and H. R. Romig, all em¬ 
ployees of the Bell Telephone Laboratories. In particular. Dodge and 
Romig wrote several articles [3] on sampling inspection in which they 
considered the concepts of two kinds of errors in testing statistical hypoth¬ 
eses and, actually, computed the probabilities of these errors. These con¬ 
cepts, then, were first bom in the practical work of engineering, to be later 
rediscovered by theoreticians who elevated them to the role of basic ideas 
in modem statistical theory. 

In the preceding subsections the reader has encountered several ex¬ 
amples relating to industrial sampling. In the present subsection we will 
consider the problem more systematically. It reduces to tests of hypotheses 
relating to a hypergeometric random variable. 

The hypergeometric random variable was defined in subsection 4-4‘l. 
Let (S be a set composed of a total of Ni + N, = N objects, of which AT, 
possess a distinctive property A and the other iV* the negation "J of A. 
A group (a sample) of » < AT objects is selected out of the set S and X 
stands for the number of those objects within the group which have the 
property A. The method of selecting the sample is so devised that any 
possible group of n objects has the same probability of being selected as 
any other. With these definitions and with this assumption, the variable 
X is the hypergeometric variable as defined in subsection 4*4• 1. The non¬ 
zero values of its frequency function are given by the formula, say, 

/nffc /nm-fc 


(5-4-9) 



* 

[5*4*4] H YPERGEOMETRIG VARIABLE 333 

This formula is valid for all values of for which the symbols in the right- 
hand side of (5-4-9) have a meaning, i.e., for 


0 


and 


0 ^ 71 — k*^ N 2 


or 


n — N 2 ^ k ^ 


In other words, for the probability P\X = k] to be positive and repre¬ 
sented by (5-4-9), k must be at least equal to the greater of the number 
zero and n — iVa and must not exceed the smaller of the numbers n and 
Ni . We write this in symbols thus: 


Max (0, n — N 2 ) ^ k ^ Min (n, Ni). 


In industrial sampling S is a consignment of some manufactured product, 
the objects marked A are defective items in the lot and the sample is 
taken either as part of a contract between the consumer and the producer 
or, in the process of manufacturing, as a means of decreasing the frequency 
of stoppages due to defective parts. In either case we are interested in 
testing the hypothesis Hi , which asserts that the number Ni of defective 
items is eq^l to a specified number Ni . The alternative composite hy¬ 
pothesis is Hi that the number of defective items is ATi < AT? . 

Remark: Actually the hypothesis tested should be, say //( , asserting 
that Ni Nl , This hypothesis, however, is composite, and, since thus 
far all our theoretical discussions relate to simple hypotheses, will not be 
considered in detail. However, the reader will have no difficulty in checking 
that the test of hypothesis Hi described below is quite satisfactory for 
testing the composite hypothesis H[ . 

It will be seen that the hypo^esis Hi is tested against a one-sided 
alternative composite hypothesis Hi . Ordinarily the size N of the con¬ 
signment is large compared to the sample size 7i and so we will assume 
n < Ni ^ A? with AT — AT? = ATj . In order to search for a uniformly_most 
powerful test we select a simple hypothesis h which belongs to H and 
asserts that Ni < ATj, and build the X*-criterion. We write, say, 


Ni) = 


vAk I AT?, AO ^ 

Px(k \Ni,N) 


NVXN^ - k)\iN - N^m - Nr-n + k)\ ^ 
(iV; - k)\N^\iN - AT? - n + k)\iN - N^)^ ' 



334 TESTS OF HYPOTHESES 

If % = 0 this expression reduces to 


[5*4-4] 


X*(0, AT,) 


(N - N^l (N-N,-n)l 
(AT-A^-n)! (AT-AT,)! 


AT - AT? AT - AT? - 1 AT - AT? - n + 1 
AT-AT, Ar-Ari-r’'iV-Ar, - n+1 


and it is seen that, if AT, < AT? , all the factors on the right-hand side are 
less than unity. Therefore X*(0, AT,) is necessarily less than unity. In order 
to have a clearer picture of the change in \*ik, AT,) as fc increases, we 
compute the ratio, say R(k), of two consecutive values of Nt), say 


R{k) = 


X*(fc. AT.) 
\*(k - 1, N^) 


hTt-k+l N - Nt - n + k _ 
Ni-k+lN-Nt-n + k' 


If A^ > ATi , it is easily seen that each of the two factors on the right- 
hand side is greater than unity, no matter what the value of k may be. It 
follows that R(k) > 1 and that, therefore, 

\*ik - 1, AT,) < \*ik, Nt) 

for all values of A: = 1, 2, • • • n and for all Nt < N^ . We conclude that, 
whichever hypothesis'ft belonging to Hi we choose to consider, the corre¬ 
sponding standard B.C.R. for testing H, against h is always the same, 
consisting of several of the possible sample points with the least abscissae, 
fc = 0, 1, 2, • • • , <, where t must be selected in accordance with the pre¬ 
scribed value of the level of significance a. Of course, this choice of the 
critical region could be made and, actually, was made on purely intuitive 
grounds. In order to obtain the appropriate value of t we should compute 
the probability given Hi that X will not exceed i and equate it to a, 

PlXgHNi} = EPx(k) AT? , AT) = a. 

lt-0 

These computations are very tedious when N and n are large. Dodge 
and Romig have provided useful tables from which the value of t can be 
read for a considerable niunber of combinations of values of a, N and H? . 

In subsection 4’5’2 we have seen that when AT, Nt , and Nt are large, 
then the hypergeometric frequency function does not differ very much 
from the binomial. Using this and also the Normal approximation to the 
binomial, approximate solutions of the problem of industrial sampling in- 



[5'4'5] HYPERGEOMETRIC VARIABLE ‘ 335 

spection can be obtained by treating them as if the hypotheses Hi and 
Hi were concerned with the binomial variable whose frequency function is 
generated by the expansion of (g + pu)", where 

(5-4-10) = 

PROBLEMS AND EXERCISES 

1. A consignment oi N = 10,000 electric lamps is to be purchased by 
a consumer who insists on sampling inspection which will insure a prob¬ 
ability of about .95 that, should the proportion of defective lamps be as 
large as 10 per cent, this fact will be detected by inspection and the price 
adjusted accordingly. The manufacturer believes that the proportion of 
defective lamps is 5 per cent (or less) and wishes that, whenever this pro¬ 
portion actually is 5 per cent, the probability of the lot passing the test 
be about .90. Use the approximations suggested above and devise a test 
satisfying both the consumer and the producer. 

This problem is stated in the terms which may be expected from a pro¬ 
ducer and consumer who understand the situation but are not familiar 
with statistical terminology. Restate the problem in terms of statistical 
hypotheses, critical regions, power functions, etc. 

2. Contracts regarding sampling-inih»pection plans are always a matter 
of bargaining. Suppose, then, that a contract for delivery of a consignment 
of electric lamps specifies the sample size n = 200 and that the consign¬ 
ment will be accepted by the consumer if the number X of defective 
lamps in the sample does not exceed 30. Use the approximations indicated 
and determine what the actual proportion p' of defective lamps in the 
consignment is in order that the probability of the consumer accepting the 
consignment is about equal to .95. 

5*4*5. Hypotheses relating to the hypergeometric random variable. Fish¬ 
tagging experiments. The frequency function px{k \Ni , N) exhibited in 
(5*4*9) of the hypergeometric variable X depends upon two parameters: 
Nf the total number of objects forming the set 8 and ATi = AT — i\r 2 , 
the number of those objects which possess the distinctive property A. In 
subsection 5*4*4 we considered the case in which the value of N is known 
but that of Ni is subject to doubt. We deduced a test of the hypothesis 
Hi ascribing to Ni a specified value Ni . In the present subsection we will 
consider the reverse situation, where Ni is a known number but N = 
Hi + N 2 is unknown. In these circumstances we will deduce a test of the 
hypothesis which will be denoted by the letter H, without any subscript, 
ascribing to iV a specified value N^. The hypotlmsis H will be tested against 
a one-sided alternative^^ either the hypothesis H' asserting that Ni ^ N < 
AT® or the hypothesis fiT" which asserts that N > iV®. 



336 ' TESTS OF HYPOTHESES [5’4*5] 

Interesting practical problems leading to tests of the hypothesis H are 
connected with wildlife control and, in particular, with the administration 
of fisheries. In certain cases it is desired to keep the number of individuals 
of a given species either below a certain specified level or above a specified 
level. Cases of the first kind refer to animals which are pests if allowed to 
multiply beyond reasonable limits. Cases of the second kind refer to com¬ 
mercially exploited populations of animals such as fish. 

In order to watch the changes in the number of animals in question, a 
method of tagging has been developed. In its simplest form it consists of 
the following. A considerable number Nx of animals are caught and marked 
or tagged. The tagged animals are then set free. Subsequently, another 
number n of the same animals are caught. The letter X denotes the number 
of those animals out of the n caught which carry tags. This method was 
extensively applied to study the dwindling population of California sar¬ 
dines. It is also used iii studies of populations of salmon, trout, ground 
squirrels, and, even, of rattlesnakes and tsetse flies. 

Denote by N the unknown total number of animals considered. It is 
obvious that, the greater N is compared to Nx , the smaller the number 
X of tagged animals expected among the n animals caught in the second 
catch. One is, then, inclined to use the observable value of X to judge the 
value of N. Unfortunately, however, it is very difficult to construct an 
entirely convincing theory which would lead to a calculable frequency 
function of the variable X. The postulate ordinarily made is that, if a 
reasonably large interval of time is allowed to pass between the tagging of 
the animals and the subsequent catch, the probability that an animal be 
obtained in the second sample of n is independent of whether or not it is 
tagged. If this is the case, then the variable X can be considered as the 
hypergeometric random variable with its frequency function defined by 
(5*4*9). Among the many interesting works on this subject we will men¬ 
tion papers by Chapman [2] and by Howard [5]. 

Leaving aside the question of whether or not the treatment of the 
variable X as a hypergeometric random variable will give results in close 
agreement with actual facts in wildlife studies, we will consider the 
problem of a uniformly most powerfuUtest of the hypothesis H above, 
tested against a one-sided alternative ff'.’ Tte treatment of the similar 
situation in which the alternative hypothesis is H" will be left to the reader. 

In the majority of cases of actual study of wildlife, the number Nx of 
tagged animals is smaller than the total number n of animals caught in 
the second catch. In fact, in some cases, Nx is minute compared to n. 
For example, this is true in the studies of the population of sardines where 
n stands for the total number of sardines caught commercially and proc¬ 
essed by several packing companies cooperating in sardine research and 
is expressed perhaps in hundreds of millions. In salmon research the dis¬ 
parity between Nx and n is not so great but the inequality Nx < n is still 



HYPERGEOMETRIC VARIARLE 


[5-4-5] 


337 


a general rule. Consequently, in the typical situation the limits for the 
variable X are 


0 <n 


and we will limit our considerations to this particular case._ 

Consider a particular simple hypothesis h' belonging to H', and denote 
by N' the value it ascribes to N. Thus, Ni ^ N' < iV®. The X*-ci’iterion 
corresponding to this hypothesis h' will be denoted by We have 


X*(fc, N') 


PAk I iV, , AT) ^ 
Pxik I N, , N') 


^ N'\(.N° - N,y. (AT” - n)! (N' - N, - n + k)\ ^ 
N^N' - N,)\(N' - n)\ (N’' - Nr - n + k)\‘ 

As in the preceding subsection, in order to study the dependence of 
\*{k,N') on k it is convenient to compute the quotient, say 


Q(k, N') 


\*ik, N') 

\*{k - 1,N')' 


Easy algebra gives 


QQc, N') 


N' - Nr -n + k 
N° - Nr - n + k 


for k = 1, 2, • • • , AT, . It is easy to see that, if N' < N°, then Q(k,N') is 
less than unity, irrespective of the value of k. It follows that, in the case 
considered, 

\*ik - 1,N') > \*(k,N'), 

and, thus, that the standard B.C.R. for testing H against h' is the region, 
say w', composed of those possible sample points with the greatest values 
of the abscissa A: = JV, , AT, — 1, • • • , T (say). 

The value of T must be chosen so as to conform with the prescribed 
level of significance a, i.e., so that, at least approximately, 

(5-4-11) P{X ^ T I W, , AT",} = Z = a. 

Since this region w' is the B.C.R.Jrrespective of the particular hypoth¬ 
esis h', provided that h' belongs to H', it follows tha^t/;' is the uniformly 
most powerful critical region for testing H against H\ The reader may 
wish to make the necessary changes in the above reasoning to make it 
applicable to the case where n < Ni . 

The problem of testing hypotheses regarding animal populations using 
tagging experiments arises only when these populations are so large that 



338 TESTS OF HYPOTHESES [ 5 * 4 - 6 ] 

a complete census is impractical. Therefore we may safely assume that 
iV® and N' are large numbers. Also, in order to have a reasonable chance 
of success, Ni and n must be large. Therefore, we are likely to be in the 
situation where the approximation provided by substituting the binomial 
distribution for the hypergeometric gives accurate results. After this start 
on the road of approximations one is tempted to proceed and to substitute 
for the sum in (5-4-10) the Normal integral. However, occasionally there 
are difficulties due to the fact that iVi is a very small fraction of N®. Thus 
the value of p = Ni/N^ in the binomial distribution which approximates 
the hypergeometric is very small and, as was illustrated in subsection 
4'5’5, the approximation to the binomial provided by the Normal integral 
may be unsatisfactory. Difficulties of this kind were extensively discussed 
by Chapman [2]. 

When p = iVi/AT® is not very small, then the value of T insuring that 
the left-hand side of (5•4-11) is approximately equal to a, and also the 
approximate values of the power function of the test may be obtained 
using the Normal approximation as indicated in subsection 5-4*4. 

PROBLEMS AND EXERCISES 

1. In a tagging experiment conducted in British Columbia in 1938, a 
total of Ni — 7809 salmon swimming into Cultus Lake were tagged and 
released. Subsequently a sample of n = 13,679 salmon which spawned 
and died in Cultus Lake was inspected for tags. Out of these X' = 1529 
salmon had been tagged. 

Consider the hypothesis H that the population of salmon which spawned 
and died in Cultus Lake comprised a total of N° — 80,000 fish. Deduce 
the uniformly most powerful test of the hypothesis H against the alterna¬ 
tive H' which asserts that N' < 80,000. Use the level of significance a = .05. 

Does this test reject the hypothesis H with X = X' = 1529? Use the 
Normal approximation to compute the power function, say |8(W), of the 
test for values of N — 70,000, 75,000, and 78,000. Make a plot of /3(N). 

2. In 1938, Dr. Sato, the Japanese biologist, tagged iVi = 1358 of a 
particular species of salmon in the Western North Pacific. Subsequently 
the Japanese fisheries caught 12,339,000 salmon from the same population. 
Of these, X' = 177 salmon had been tagged. Use the level of significance 
a = .05 and test the hypothesis H that the number N of salmon injffie 
population studied is JV® = 90,000,000. The alternative hypothesis H", 
asserts that N > JV®. Use the Normal approximation and compute the 
power of the test for values ofN = 0.95 X 10®, 1.00 X 10®, and 1.05 X 10®. 

6'4-6. Lambda-principle of testing hypotheses. The X^-criterion was 
introduced in subsection 5*3-2 on a completely rational basis, as a means 
of constructing critical regions which combine a given value of the prob¬ 
ability of error of the first kind with the minimum value of the probability 



[ 5 * 4 ’ 6 ] LAMBDA PRINCIPLE 339 

of error of the second kind. This treatment of the problem of testing 
hypotheses goes back to 1933 [9]. However, in 1928 an intuitive principle 
had already been formulated [10], the so called X-principle, which, in its 
application to tests of a simple hypothesis against a simple alternative, 
reduces essentially to the use of the X*-criterion. 

This X-principle (note the absence of the asterisk which emphasizes the 
distinction between the X-principle and the X*-criterion connected with 
the B.C.R.'s) was formulated as a result of a detailed analysis of conditions 
in which we are inclined, on intuitive grounds, to doubt that a given 
statistical hypothesis H is true. Discussions of this kind are due to the 
celebrated French mathematicians Joseph Bertrand and fimile Borel. The 
book by Borel [1] under the title ^^Le Hasard’’ is particularly recommended 
to the reader. A brief critical outline of the history of the basic ideas in 
testing statistical hypotheses can be found in the present author’s “Lec¬ 
tures and Conferences on Mathematical Statistics” [7]. 

Under pressure from various fields of application, tests of statistical 
hypotheses have been performed by many scholars for a long time. The 
author has found a remarkable example of a test of a statistical hypothesis 
in a memoir by Laplace published in 1773 [6]. However, the general ideas 
behind the tests remained obscure. Bertrand examined the procedure 
usually applied and came to the pessimistic conclusion that the usefulness 
of tests is an illusion. Briefly, the usual procedure, which he criticized, 
consisted of selecting a function of the observable random variables and 
using it as a criterion to judge the hypothesis tested. If, given this hypoth¬ 
esis, the observed value of the criterion had small probability of being 
observed, the hypothesis was subjected to doubt. Otherwise it was con¬ 
sidered that there is no reason to doubt the hypothesis under consideration. 
Bertrand noticed that in many cases it is possible to devise two or more 
different criteria which, if used in the manner just described, give contra¬ 
dictory verdicts. This discovery suggested serious doubts as to the possi¬ 
bility of testing statistical hypotheses in general. 

While admitting the validity of this criticism of tests based on criteria 
selected more or less accidentally, Borel suggested that there may exist 
special criteria “en quelque sorte remarquables,” whose usefulness could 
be established rigorously. Among other things, Borel insisted on the sound 
idea that the choice of the criterion to be used must be made before the 
observations are made or, at least, independently of the outcome of the 
observations. 

The formulation of the X-principle was the result of an effort to discover 
what kind of criterion would answer Borel’s requirement that it be “en 
quelque sorte remarquable” so that its value would gauge our confidence 
in the hypothesis tested. 

The first idea in this direction was that our confidence in an hypothesis 
Hy given an observed sample point E = may be measured by the 



340 


TESTS OF HYPOTHESES 


[5-4-6] 

probability ps(E^ I H) determined by H that the observations will give 
the result E*, Rule Ri in subsection 5-2-7 is an example of the application 
of this idea, and the reader will have noticed that the resulting test is not 
the best possible. This failure is easy to explain on intuitive grounds. 

First of all, we may mention that in many cases the sample space W 
contains a very large number of possible sample points, and the individual 
probabilities of all these points are very small on any admissible hypothesis, 
including the one under test. Yet, the observations must yield some results, 
and thus one of the highly improbable events must occur. Furthermore, 
it may happen that, while the probability of the observed sample point 
determined by the hypothesis tested is very small, those determined by 
any one of the admissible alternative hypotheses are even smaller. 

Suppose, for example, that on the hypothesis tested /f, the probability 
of the actually observed sample point jB' is, say, one in a million, which 
is certainly a very small number. Suppose further that whatever alternative 
simple admissible hypothesis h we consider, the corresponding probability 
of the same sample point E' is less than one in ten million. In these cir¬ 
cumstances, do we really have intuitive reasons for doubting /f? The 
answer is, obviously, ^‘no,’^ and we immediately notice the relevance of 
the set il of admissible hypotheses. We notice further that our intuitive 
attitude towards the hypothesis H would change radically if, among the 
admissible simple hypotheses hy there was one ascribing to E' a prob¬ 
ability much greater than that ascribed to E' by the hypothesis tested H, 
In fact, if the observations yielded E' which has the probability of one in 
a million under H while on one of the alternative hypotheses the prob¬ 
ability of E' is as large as one in ten, then we may expect the unanimous 
opinion that H should be abandoned in favor of h or in favor of some 
other admissible hypothesis which ascribes to E' reasonably large prob¬ 
ability. The above intuitive discussion is the key to the formulation of the 
X-principle. 

. Lambda-Principle for Testing Simple Hypotheses. If His a simple 
hypothesis concerning the random sample point E and if is the set of admis¬ 
sible simple hypotheses hy then the appropriate criterion for testing H against 
is the qiLotienty say 

where e stands for a possible sample point in the sample space considered 
and (e | fl) is the greatest probability* of e ovi of those ascribed to it by 
the admissible simple hypotheses of the set Q. 

In order to apply this principle, one performs the observations and 
obtains the observed sample point, E'. The coordinates of this point are 

*More precisely, the least upper bound of the probability. 

a 



[ 5 - 4 ' 6 ] LAMBDA PRINCIPLE 341 

Bubstituted into (5-4-12) and the value of X(£') computed. The hypoth¬ 
esis H is rejected if X(£') is ‘‘too small.’' The limit t between the values 
of X(c) which are “too small" and those which are not is set so as to con¬ 
form with the level of significance a chosen in advance. It will be seen 
that the application of the X-principle leads to the rejection of H only if 
among the admissible simple hypotheses there is at least one which ascribes 
to the sample point observed a larger probability than that ascribed by H. 
Obviously, the use of the X-principle is equivalent to the use of the critical 
region which includes all possible sample points e' for which X(e') < t, 
none of those points c" for which X(c'') > ty and, perhaps, some or all of 
those c'" for which X(e"') = L 

Lambda-Principle for Testing Composite Hypotheses. If H is a 
composite hypothesis concerning the random sample point E and if 12 is the set 
of admissible simple hypotheses h, then the appropriate criterion for testing H 
against 12 is the quotienty say 

(5-4-13) 

where e stands for a possible sample pointy (« | H) is the greatest prob¬ 
ability'*^ of e ont of those ascribed to it by the simple hypotheses belonging to Hy 
and Pmax I is Ihe gj'ealest probability'*^ of e ascribed by any admissible 
simple hypothesis of the set 12. 

In order to apply this principle it is necessary to perform the observa¬ 
tions and obtain the coordinates of the observed sample point E\ These 
coordinates are substituted into (5-4-13). If the resulting value X(£') is 
“too small," then the hypothesis tested H is rejected. Otherwise, it is 
accepted. The limit t between the values of X(^) which are “too small" 
and those which are not is so determined that, if the hypothesis H is true, 
then the probability of H being rejected by the test does not exceed a 
limit equal (at least approximately) to the chosen level of significance a. 

It will be seen that whenever, among the admissible simple hypotheses, 
there is at least one which ascribes to the observed sample point a prob¬ 
ability much larger than even the largest probability consistent with the 
hypothesis tested Hy then the use of the test based on the X-principle 
(X-test, for short) will lead to the rejection of H. This is in perfect agree¬ 
ment with what one is inclined to do on purely intuitive grounds. 

Whether the hypothesis tested is simple or composite, it is obvious that 
X(e) is a quantity between zero and unity. As in the case with the X*- 
criterion, the unique purpose of X(e) is to sort the possible sample points 
according to the relative desirability of their being included in the critical 
region. The smaller X(c), the more desirable is the corresponding possible 

*More precisely, the least upper bound of the probability. 



342 TESTS OF HYPOTHESES [5-4*6] 

sample point. Therefore, instead of using the quotient X(e), the same pur¬ 
pose may be achieved by using any strictly increasing function of X(c). 
If both the hypothesis tested H and its negation H are simple, then X(6) 
coincides with X*(6) in all those points e for which X*(e) < 1. Otherwise, 
when X*(6) > 1, the value of \{e) is unity. In the majority of cases, the 
level of significance a is a small number and, therefore, the corresponding 
standard B.C.R. includes only such points for which \*{e) < 1. In all these 
cases the critical region determined by tjie X-principle coincides with the 
best critical region for testing H against H. 

If the value of a is so large that the corresponding standard B.C.R. 
contains points with X*(e) ^ 1, then this B.C.R. is simply one of the 
critical regions determined by the X-principle. _ 

If the hypothesis tested H is simple but the alternative H is composite, 
and if among the standard B.C.R.^s there exists a uniformly most powerful 
critical region, then the X-principle will determine this region. On the other 
hand, if no uniformly most powerful critical region exists among the 
standard B.C.R.’s, then the X-principle provides a reasonable compromise 
between the conflicting tendencies of testing H most effectively at once 
against all possible simple admissible hypotheses. The same applies to 
cases where the hypothesis tested is composite. 

In a remarkable memoir Wald [14] has proved certain optimum prop¬ 
erties of tests based on the X-principle. Unfortunately, these properties are 
too delicate to be discussed here. Some illustrations of the use of X-principle 
will be found in Volume 2. 

PROBLEMS AND EXERCISES 

1. Consider the problem of the Lady tasting tea as described in sub¬ 
section 6-2-3. In particular consider the hypothesis H that the Lady has 
no discriminating ability and the test of this hypothesis against the two- 
sided set of admissible hypotheses Qi . Put n = 10, deduce the value of 
the X-criterion, say X(A;), compute its values for jfc = 0, 1, 2, • • • , 10, and 
make a plot. Determine the critical region based on the X-principle so as 
to conform with the level of significance a = .05. Compute and plot the 
power function of the test deduced. 

2. In Shangri-la the automobile licenses' bear consecutive numbers 1, 
2, • • • , i\r, so that N stands for the total number of automobiles in the 
country. We will take it for granted that all the automobiles travel about 
with equal frequency and independently from each other so that seeing 
any particular car at a given moment does not influence the probability 
of seeing it again in the next moment nor the probability of seeing any 
other car. You are a visitor in Shangri-la and are interested in the hy¬ 
pothesis H ^at the number of cars AT is a given number A^. The alternative 
hypothesis H is two-sided and asserts merely that N 7 ^ 

Your observations will yield 7i license numbers , Ya, • • • , The 



[5*4'6] PROBLEMS AND EXERCISES ’ 343 

above hypotheses imply that, whatever system of positive integers ki , 
ki, • • • ,kn, different or not but all not exceeding N, the probability 

P[{X, = = *,)••• (^» = *„)}= {^\ 

Let L stand for the greatest of Xi , Xa , • • • , Xn and let < be a fixed 
positive number less than X®. Prove that the rule of rejecting H whenever 
L exceeds and also whenever L ^ ^constitutes the uniformly most 
powerful test of the hypothesis H against H. Solve the distribution problem 
involved and determine t so as to conform with the level of significance a. 
After obtaining all the necessary formulae put AT® = 20, n = 5, a == .05, 
and determine the numerical value of L Compute and plot the power 
function of the test for X = 1, 2, • • • , 20, 25, 30, 35, 40. 

REFERENCES 

1. fimile Borel, Le Hasard, Paris: Alcan, 1914. 

2. D. G. Chapman, *‘Some properties of the hypergeometric distribution and their 

applications to sample censuses/^ To be published. 

3. H. F. Dodge and H. R. Romig, method of sampling inspection.^' Bell System 

Tech. J., Vol. 8 (1929), p. 613. 

4. R. A. Fisher, The Design of Experiments. Edinburgh: Oliver and Boyd, 1942. 

5. G, V. Howard, ^‘A study of the tagging method in the enumeration of sockeye sal¬ 

mon populations.” International Pacific Salmon Fisheries Commission^ No. 2, 
(1948). 

6. Pierre-Simon, marquis de Laplace, ‘‘M^inoirc sur Tinclinaison moyenne dos 

orbites des comdtes.” Oeuvres Complhtesj Vol. 8, p. 279. Paris: Gauthier-Villars, 
1891. 

7. J. Neyman, Lectures and Conferences on Mathematical Statistics. Graduate School, 

U.S. Dept. Agric., Washington, 1938. 

8. J. Neyman, ”On a new class of ‘contagious' distributions.” Ann. Maih. Stat.^ 

Vol. 10 (1939), p. 35. 

9. J. Neyman and E. S. Pearson, “On the problem of the most efficient tests of sta¬ 

tistical hypotheses.” Phil. Trans. A., Vol. 231 (1933), p. 289. 

10. J. Neyman and E. S. Pearson, “On the use and interpretation of certain test criteria 

for purposes of statistical inference.” Biometrika^ Vol. 20A (1928), pp. 175 
and 263. 

11. Karl Pearson (Editor), Tables of the Incomplete Beta-Function. Cambridge University 

Press, 1934. 

12. G. P61ya, “Sur quelques points de la th^orie des probabilitfes.” Ann. Inst. H. Poin¬ 

care, Vol. 1 (1931), p. 117. 

13. W. A. Shewhart, The Economic Control of Quality of a Manufactured Product. New 

York: Van Nostrand, 1931. 

14. Abraham Wald, “Tests of statistical hypotheses concerning several parameters 

when the number of observations is large.” Am. Math. Soc. Trans., Vol. 54 
(1943), p. 426. 




Appendix 

TABLES OF THE NORMAL INTEGRAL 
Direct Table op the Normal Integral 
The quantity tabled is 

H(t) = G{t) - 0(0) = —^ dx 

V 27 r •'o 

Thus H{t) is equal to the area under the Normal curve hounded by the verticals 

at zero and at L 


t 

.00 

.01 

.02 

.03 

.04 

.05 

.06 

.07 

.08 

.09 

0.0 

.0000 

.0040 

.0080 

.0120 

.0160 

.0199 

.0239 

.0279 

.0319 

.0359 

0.1 

.0398 

.0438 

.0478 

.0517 

.0557 

.0596 

.0636 

.0675 

.0714 

.0753 

0.2 

.0793 

.0832 

.0871 

.0910 

.0948 

.0987 

.1026 

.1064 

.1103 

.1141 

0.3 

.1179 

.1217 

.1255 

.1293 

.1331 

.1368 

.1406 

.1443 

.1480 

.1517 

0.4 

.1554 

.1591 

. 1628 

.1664 

.1700 

.1736 

.1772 

.1808 

.1844 

.1879 

0.5 

.1915 

.1950 

.1985 

.2019 

.2054 

.2088 

.2123 

.2157 

.2190 

.2224 

0.6 

.2257 

.2291 

.2324 

.2357 

.2389 

.2422 

.2454 

.2486 

.2517 

.2549 

0.7 

.2580 

.2611 

.2642 

.2673 

.2704 

.2734 

.2764 

.2794 

.2823 

.2852 

0.8 

.2881 

.2910 

.2939 

.2967 

.2995 

.3023 

.3051 

.3078 

.3106 

.3133 

0.9 

.3159 

.3186 

.3212 

.3238 

.3264 

.3289 

.3315 

.3340 

.3365 

.3389 

1.0 

.3413 

.3438 

.3461 

.3485 

.3508 

.3531 

.3554 

.3577 

.3599 

.3621 

1.1 

.3643 

.3665 

.3686 

.3708 

.3729 

.3749 

.3770 

.3790 

.3810 

.3830 

1.2 

.3849 

.3869 

.3888 

.3907 

.3925 

.3944 

.3962 

.3980 

.3997 

.4015 

1.3 

.4032 

.4049 

.4066 

.4082 

.4099 

.4115 

.4131 

.4147 

.4162 

.4177 

1.4 

.4192 

.4207 

.4222 

.4236 

.4251 

.4265 

.4279 

.4292 

.4306 

.4319 

1.5 

.4332 

.4345 

.4357 

.4370 

.4382 

.4394 

.4406 

.4418 

.4429 

.4441 

1.6 

.4452 

.4163 

.4474 

.4484 

.4495 

.4505 

.4515 

.4525 

.4535 

.4545 

1.7 

.4554 

.4564 

.4573 

.4582 

.4591 

.4599 

.4608 

.4616 

.4625 

.4633 

1.8 

.4641 

.4649 

.4656 

.4664 

.4671 

.4678 

.1686 

.4693 

.4699 

.4706 

1.9 

.4713 

.4719 

.4726 

.4732 

.4738 

.4744 

.4750 

.4756 

.4761 

.4767 

2.0 

.4772 

.4778 

.4783 

.4788 

.4793 

.4798 

.4803 

.4808 

.4812 

.4817 

2.1 

.4821 

.4826 

.4830 

.4834 

.4838 

.4842 

.4846 

.4850 

.4854 

.4857 

2.2 

.4861 

.4864 

.4868 

.4871 

.4875 

.4878 

.4881 

.4884 

,4887 

.4890 

2.3 

.4893 

.4896 

.4898 

.4901 

.4904 

.4906 

.4909 

.4911 

.4913 

.4916 

2.4 

.4918 

.4920 

.4922 

.4925 

.4927 

.4929 

.4931 

.4932 

.4934 

.4936 

2.5 

.4938 

.4940 

.4941 

.4943 

.4945 

.4946 

.4948 

.4949 

.4951 

.4952 

2.6 

.4953 

.4955 

.4956 

,4957 

.4959 

.4960 

.4961 

.4962 

.4963 

.4964 

2.7 

.4965 

.4966 

.4967 

.4968 

.4969 

.4970 

.4971 

.4972 

.4973 

.4974 

2.8 

.4974 

.4975 

.4976 

.4977 

.4977 

.4978 

.4979 

.4979 

.4980 

.4981 

. 2.9 

.4981^ 

.4982 

.4982 

.4983 

.4984 

.4984 

.4985 

.4985 

.4986 

.4986 

3.0 

,4987 

.4987 

.4987 

.4988 

.4988 

.4989 

.4989 

.4989 

.4990 

.4990 


345 



346 


APPENDIX 


Inverse Table of the Normal Integral 


The quarUity tabled is the valtie of t corresponding to the argument 


m) = 


\/2ir *^0 


dx 


Hit) 

.000 

.001 

.002 

.003 

.004 

.005 

.006 

.007 

.008 

.009 

.00 

.0000 

.0025 

.0050 

.0075 

.0100 

.0125 

.0150 

.0175 

.0201 

.0226 

.01 

.0261 

.0276 

.0301 

.0326 

.0361 

.0376 

.0401 

.0426 

.0461 

.0476 

.02 

.0502 

.0527 

.0552 

.0577 

.0602 

.0627 

.0652 

.0677 

.0702 

.0728 

.03 

.0753 

.0778 

.0803 

.0828 

.0853 

.0878 

.0904 

.0929 

.0954 

.0979 

.04 

.1004 

.1030 

.1055 

.1080 

.1105 

.1130 

.1156 

.1181 

.1206 

.1231 

.06 

.1257 

.1282 

.1307 

.1332 

.1358 

.1383 

.1408 

.1434 

.1459 

,1484 

.06 

.1610 

.1535 

.1560 

.1586 

.1611 

.1637 

.1662 

.1687 

.1713 

.1738 

.07 

.1764 

.1789 

.1815 

.1840 

.1866 

.1891 

.1917 

.1942 

.1968 

.1993 

.08 

.2019 

.2045 

.2070 

.2096 

.2121 

.2147 

.2173 

.2198 

.2224 

.2250 

.09 

.2275 

.2301 

.2327 

.2353 

.2378 

.2404 

.2430 

.2456 

.2482 

.2508 

.10 

.2533 

.2559 

.2585 

.2611 

.2637 

.2663 

.2689 

.2715 

.2741 

.2767 

.11 

.2793 

.2819 

.2845 

.2871 

.2898 

.2924 

.2950 

.2976 

.3002 

.3029 

.12 

.3065 

.3081 

.3107 

.3134 

.3160 

.3186 

.3213 

.3239 

.3266 

.3292 

.13 

.3319 

.3345 

.3372 

.3398 

.3426 

.3461 

.3478 

.3505 

.3631 

.3558 

.14 

.3585 

.3611 

.3638 

.3665 

.3692 

.3719 

.3745 

.3772 

.3799 

.3826 

.15 

.3853 

.3880 

.3907 

.3934 

.3961 

.3989 

.4016 

.4043 

.4070 

.4097 

.16 

.4126 

.4162 

.4179 

.4207 

.4234 

.4261 

.4289 

.4316 

.4344 

.4372 

.17 

.4399 

.4427 

.4454 

.4482 

.4510 

.4538 

.4565 

.4593 

.4621 

.4649 

.18 

.4677 

.4705 

.4733 

.4761 

.4789 

.4817 

.4845 

.4874 

.4902 

.4930 

.19 

.4969 

.4987 

.5015 

.5044 

.6072 

.5101 

.5129 

.6158 

.5187 

.5215 

.20 

.5244 

.5273 

.5302 

.5330 

.5359 

.5388 

.5417 

.5446 

.5476 

.5505 

.21 

.5534 

.5563 

.5592 

.6622 

.5651 

.5681 

.5710 

.6740 

.5769 

.5799 

.22 

.5828 

.5858 

.5888 

.6918 

.5948 

.5978 

.6008 

.6038 

.6068 

.6098 

.23 

.6128 

.6158 

.6189 

.6219 

.6250 

.6280 

.6311 

.6341 

.6372 

.6403 

.24 

.6433 

.6464 

.6495 

.6526 

.6557 

.6688 

.6620 

.6661 

.6682 

.6713 

.25 

.6745 

.6776 

.6808 

.6840 

.6871 

.6903 

.6935 

.6967 

.6999 

.7031 

.26 

.7063 

.7096 

.7128 

.7160 

.7192 

.7225 

.7267 

.7290 

.7323 

.7356 

.27 

.7388 

.7421 

.7454 

.7488 

.7521 

.7554 

.7588 

.7621 

.7655 

.7688 

.28 

.7722 

.7766 

.7790 

.7824 

.7858 

.7892 

.7926 

.7961 

.7995 

.8030 

.29 

.8064 

.8099 

.8134 

.8169 

.820^ 

.8239 

.8274 

.8310 

.8345 

.8381 

.30 

.8416 

.8452 

.8488 

.8524 

.8560 

.8596 

.8633 

.8669 

.8705 

.8742 





APPENDIX • 

Inverse Table of the Normal Integral {Continiied) 


347 


m) 

.000 

.001 

.002 

.003 

.31 

.8779 

.8816 

.8853 

.8890 

.32 

.9154 

.9192 

.9230 

.9269 

.33 

.9542 

.9581 

.9621 

.9661 

.34 

.9945 

.9986 

1.003 

1.007 

.35 

1.036 

1.041 

1.045 

1.049 

.36 

1.080 

1.085 

1.089 

1.094 

.37 

1.126 

1.131 

1.136 

1.141 

.38 

1.175 

1.180 

1.185 

1.190 

.39 

1.226 

1.232 

1.237 

1.243 

.40 

1.282 

1.287 

1.293 

1.299 

.41 

1.341 

1.347 

1.353 

1.360 

.42 

1.405 

1.412 

1.419 

1.426 

.43 

1.476 

1.483 

1.491 

1.498 

.44 

1.555 

1.563 

1.572 

1.580 

.45 

1.645 

1.655 

1.665 

1.675 

.46 

1.751 

1.762 

1.774 

1.787 

.47 

1.881 

1.896 

1.911 

1.927 

.48 

2.054 

2.075 

2.097 

2.120 

.49 

2.326 

2.366 

2.409 

2.457 

.50 

00 





.004 

.005 

.006 

.007 

.008 

.8927 

.8965 

.9002 

.9040 

.9078 

.9307 

.9346 

.9385 

.9424 

.9463 

.9701 

.9741 

.9782 

.9822 

.9863 

1.011 

1.015 

1.019 

1.024 

1.028 

1.054 

1.058 

1.062 

1.067 

1.071 

1.098 

1.103 

1.108 

1.112 

1.117 

1.146 

1.150 

1.155 

1.160 

1,165 

1.195 

1.200 

1.206 

1.211 

1.216 

1.248 

1.254 

1.259 

1.265 

1.270 

1.305 

1.311 

1.316 

1.322 

1.328 

1.366 

1.372 

1.379 

1.385 

1.392 

1.432 

1.440 

1.447 

1.454 

1.461 

1.506 

1.514 

1.522 

1.530 

1.538 

1.589 

1.598 

1.607 

1.616 

1.626 

1.685 

1.695 

1.706 

1.717 

1.728 

1.799 

1.812 

1.825 

1.838 

1.852 

1.943 

1.960 

1.977 

1.995 

2.014 

2.144 

2.170 

2.197 

2.226 

2.257 

2,512 

2.576 

2.652 

2.748 

2.878 


.009 


.9116 

.9502 

.9904 

1.032 

1.076 

1.122 

1.170 

1.221 

1.276 

1.335 

1.398 

1.468 

1.546 

1.635 

1.739 

1.866 

2.034 

2.290 

3.090 






Index of Names 


BERNOULLI, Jacob, 56 
BERNSTEIN, Felix, 116,162 
BERNSTEIN, Sergo, 116,162 
BERTRAND, Joseph, 339 
BIRKELO, C. C., 196,197 
BOREL, Emile, 13, 339, 343 
BORTKIEWICZ, W., 66 
BRIDGES, C. B., 163 
CHAMBERLAIN, W. E., 196,197 
CHAPMAN, Douglas G., 336, 338, 343 
COPELAND, A. H., 4,13 
CRAMER, Harald, 13 
DOBZHANSKY, T., 117,162 
DODGE, H. F., 332, 334,343 
DOOB, J. L., 12,13 
DUNN, L. C., 163 

FISHER, R. A., 12,14,272,282, 286, 286, 
287, 292, 343 

GALTON, Sir Francis, 117,122 
GAUSS, Karl Friedrich, 220 
GEIRINGER, HUda, 116,162 
GOSSET, William S. (see also “Student”), 
331 

HALDANE, J. B. S., 117, 162 
HARDY, G. H., 116, 163 
HOTELLING, Harold, 12, 14 
HOWARD, G. V., 336, 343 
JEFFREYS, Harold, 14 
KAC, Mark, 204 
KENDALL, M. G., 13, 14 
KEYNES, J. M., 14 

LAPLACE, Pierre-Simon de, 12, 14, 219, 
227, 234, 339, 343 


LEHMER, Emma, 

LOWAN, Arnold N., 221, 249 
MARKOFF, A., 12, 14 
MENDEL, G., 96 

M£r£), Chevalier de, 4, 5, 6, 9,10, 12,13, 
24, 26, 27, 181, 259, 260, 262, 264 
MISES, Richard von, 4, 14 
MOLINA, E. C., 215, 249 
MORGAN, T. H., 114, 163 
MULLER, H. J., 163 
NEWTON, Sir Isaac, 39, 40, 41, 43, 44, 
64 

NEYMAN, J., 1,12, 14, 339, 343 
PASCAL, B., 4, 5, 6 
PEARSON, Egon S., 12, 14, 343 
PEARSON, Karl, 12, 14, 117, 122, 329, 
331, 343 

PHELPS, P. S., 196, 197 
POLYA, G., 343 
RANDOLPH, J. F., 204 
ROMIG, H. R., 332, 334, 343 
SCHEFFfi, Henry, 28 
SCHOOLS, P. E., 196, 197 
SHEWHART, Walter A., 332, 343 
SINNET, E. W., 163 
‘‘STUDENT^^ (see also Gosset), 12, 14, 
332 

STURTEVANT, A. H. 163 
TODHUNTER, L, 4, 14 
WALD, Abraham, 4, 12, 14, 342, 343 
WRIGHT, Sewall, 117, 163 
YERUSHALMY, Jacob, 196, 197 
ZACKS, D., 196,197 


348 



Index of Terms 


Remark: Many terms used in this book appear on a great many pages. 
Thus, for example, the term '^probability^' is mentioned very frequently 
throughout the book, first as a subject of study and then as an element in 
studying other concepts. Pages listed in the present Index refer to in¬ 
stances where a given concept is the subject of discussion for its own sake. 
Also, the number of references is limited to five. 


Absolute probability, 21, 56, 63, 76, 109 
Addition thciorem on probabiliticis, 49, 53, 
54, 65, 66 

Admissible hypotheses, set of, 10, 11, 254, 

258, 260 

Argument of a function, 165,166,174,222, 
224 

Arrangements (see: ordered groups), 34, 
36, 37 

Assortative mating, 117, 118, 122 

Best critical region, 304, 305, 308, 310, 311 
Binomial formula (see: Newton’s expan¬ 
sion) 

Binomial variable, 180, 211, 251, 326, 328 

Chromosomes, 96, 97, 98, 100, 137 
Combinations (see: unordered groups), 34, 
35, 38 

Complete independence, 59, 60, 61, 62, 63 
Composite statistical hypotheses, 253,257, 

259, 260, 275 

Conditional probability, 20, 21, 75, 85 
Confidence intervals, 12 
Contagious distributions, 291, 292 
Continuous function, 234 
Critical region, 264, 265, 266, 267, 268 

Definite integral, 234, 236 
Diagnosis, 196, 271 

Direct method of computing probabilities, 
27, 182 

Distribution function, 167, 168, 170, 319, 
320 

Distribution problem, 317, 318, 320, 343 
Dominant (see: genes) 

Duhamel lemma, 238, 240, 241, 243, 245 


Elimination mating, 190, 191, 192, 193, 
194 

Equivalent properties, 46, 66, 82, 83, 85 
Estimation, 12 

Event point (see: sample point) 

Factorial, 33, 34 

First kind of error in testing hypotheses, 
263, 264, 265, 266, 275 
Fish-tagging experiments, 335, 338 
Frequency distribution, 164, 168 
Frequency function, 167, 195, 250, 319, 
320 

Function, 164, 165, 166, 167, 168 
Fundamental lemma, 305 
Fundamental probability set, 15, 16, 17, 
20, 24 

Gauss-Laplace integral (see: Normal in¬ 
tegral) 

Genes, dominant, recessive, 96,98,99,100, 
102 

Genetics, 96, 100 

Geometric progression, 81, 82, 83, 84 

Hypergeometric variable, 200, 202, 211, 
331, 335 

Hypothesis tested, 259, 260, 263, 264, 266 

Independence, stochastic, 55, 56, 57, 58, 59 
Induction, mathematical, 60, 130 
Inductive behavior, 1, 2, 5, 258, 259 
Industrial sampling inspection, 202, 331, 
334 

Lady tasting tea, 272, 277, 282, 285, 290 
Lambda principle, 338, 340, 341 


349 



350 


INDEX OF TERMS 


Lambda star criterion, 305, 310, 312, 313, 
314 

Lambda star criterion, working form, 314, 
315, 318, 319, 320 

Laplace’s theorem, 219, 220, 222, 227, 234 
Level of significance, 264, 265, 266, 267, 
275 

Limit, 204, 211, 212, 213, 216 
Logical product, 45, 49, 52, 100, 106 
Logical sum, 45, 49, 60, 61, 82 

Mathematical model, 70, 72, 76, 77, 78 
Mathematical statistics, scope of, 11 
Most probable value, 177, 178, 182, 183, 
186 

Multiplication symbol, 33 
Multiplication theorem on probabilities, 
51, 52, 53, 62, 63 

Natural logarithms, base of, 204, 205, 235 
Necessity, 150, 151, 152, 153 
Newton’s expansion, 39, 40, 41, 43, 180 
Normal integral, 220, 221, 222,223, 225 
Normal integral, rule of using, 224, 329 
Normal probability density function, 220, 
231, 234 

Normalized binomial variable, 227, 229 
Null hypothesis (see: hypothesis tested) 

Ordered groups (see: arrangements) 

Panmixia, 117, 118, 119, 120, 121 
Paradoxes of probability, 53, 54, 68 
Performance characteristic, 7, 8, 11, 267, 
268 

Permanency, 1, 2, 12, 96, 102 
Permutations, 34, 35, 38 
Poisson law, variable, 213, 214, 215, 216, 
291 

Possible sample point, 255, 256, 257, 264, 
266 

Power, 268, 271, 329 
Power function, 264, 267, 268, 276, 280 
Probability, 15, 16, 17, 18, 19 
Probability law, 167, 168, 191 


Randomness, 9, 13, 75,101, 195 
Random variable, 164, 166, 167, 168, 169 
Rate of risk, crude, net, 71, 72, 73, 76, 77 
Recessive (See: genes) 

Relative probability, 20, 21, 25, 55, 56 
Reproductive cell, 97, 98, 99, 100, 102 . 

Riemann sum, 235, 236, 240, 245, 246 
Risks, competing, 69, 70, 71, 72, 73 

Sample point, 255, 256, 257, 258, 265 
Sample space, 255, 256, 257, 258, 264 
Second kind of error in testing hypotheses, 
263, 264, 265, 266, 267 
Selection, 115, 116, 123, 128, 130 
Simple statistical hypothesis, 253, 254, 
255, 257, 258 

Stable distribution, stability, 149, 150, 
151, 152, 153 

Standard family of B.C.R.’s, 321, 323, 

325, 326, 327 

Standardized binomial variable, 227 
Statistical decision function, 10, 12 
Statistical hypotheses, theory of testing, 
12, 250, 251, 252, 253 
Stirling’s formula, 207, 209, 210, 216, 240 
Stochastic (see: independence) 

Sufficiency, 150, 151, 152, 153 
Summation symbol, 31, 32, 33 

Tables of Normal integral, 221, 222, 225, 
344, 346 

Taylor formula, 234, 243 
Tests of statistical hypotheses 250, 258, 
259, 261, 263 

Toxicity of the drug, 262, 263, 264 
Transition probabilities, 76, 90 
Tuberculosis, screening for, 268, 294 

Uniformly most powerful tests, 324, 325, 

326, 327, 328 

Unordered groups (see: combinations), 34, 
35, 39, 200 

Weighted binomial, 195, 197 


Randomization, 292, 293 Zoological populations, 202, 336 




